Hi folks,
Another ext3 foobar on 4.7-rc6. The test VM hung when the rootfs ran
out of space. mountinfo:
16 0 8:1 / / rw,relatime shared:1 - ext3 /dev/root rw,errors=remount-ro,data=ordered
After reboot, df:
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/root 9696448 9696420 0 100% /
I then ran:
$ rm -rf /mnt/scratch
to cleanup some mess left by xfstests. This returned huge numbers of
EPERM errors (expected, as files were created by root), but then the
rm -rf process segfaulted. On the console:
[ 26.275026] ------------[ cut here ]------------
[ 26.275672] kernel BUG at fs/ext4/xattr.c:1331!
[ 26.276231] invalid opcode: 0000 [#1] PREEMPT SMP
[ 26.276820] Modules linked in:
[ 26.277226] CPU: 0 PID: 3127 Comm: rm Not tainted 4.7.0-rc6-dgc+ #839
[ 26.278014] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Debian-1.8.2-1 04/01/2014
[ 26.279103] task: ffff880336dda3c0 ti: ffff880339740000 task.ti: ffff880339740000
[ 26.280033] RIP: 0010:[<ffffffff81310dfb>] [<ffffffff81310dfb>] ext4_xattr_shift_entries+0x5b/0x60
[ 26.281165] RSP: 0018:ffff880339743cf8 EFLAGS: 00010202
[ 26.281825] RAX: 000000000030000e RBX: ffff88013ab73740 RCX: ffff88013a295f9c
[ 26.282708] RDX: 0000000000000000 RSI: 000000000000000c RDI: ffff88013a295fa0
[ 26.283595] RBP: ffff880339743cf8 R08: ffffffffffffffd0 R09: 0000000000001000
[ 26.284466] R10: 000000000000000e R11: ffff88013a295fa0 R12: ffff8800bae366c0
[ 26.285335] R13: 0000000000000000 R14: 000000000000000a R15: ffff880139c895b0
[ 26.286201] FS: 00007f1805169700(0000) GS:ffff88013bc00000(0000) knlGS:0000000000000000
[ 26.287181] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 26.287890] CR2: 00007fffedd68f94 CR3: 00000000ba973000 CR4: 00000000000006f0
[ 26.288757] Stack:
[ 26.289015] ffff880339743de0 ffffffff8131326d 000000000000001c ffff880139c894d8
[ 26.289974] ffff88013b6a54e0 0000000000000ebc ffff880000c02000 ffff880339743da0
[ 26.290934] ffff88013a295f00 0000000000000000 000000000000005e ffff88013a295fa0
[ 26.291901] Call Trace:
[ 26.292216] [<ffffffff8131326d>] ext4_expand_extra_isize_ea+0x3ad/0x810
[ 26.293033] [<ffffffff812de361>] ? ext4_unlink+0x341/0x380
[ 26.293709] [<ffffffff812d0e6c>] ext4_mark_inode_dirty+0x1cc/0x230
[ 26.294470] [<ffffffff812de361>] ext4_unlink+0x341/0x380
[ 26.295126] [<ffffffff81203211>] vfs_unlink+0xf1/0x180
[ 26.295783] [<ffffffff81207839>] do_unlinkat+0x259/0x2d0
[ 26.296442] [<ffffffff812082ab>] SyS_unlinkat+0x1b/0x30
[ 26.297096] [<ffffffff81e3cc72>] entry_SYSCALL_64_fastpath+0x1a/0xa4
[ 26.297876] Code: 77 29 66 44 89 57 02 0f b6 07 48 83 c0 13 48 83 e0 fc 48 01 c7 8b 07 85 c0 75 c9 4c 89 c2 48 89 ce 4c 89 df e8 67 e8 4e 00 5d c3 <0f> 0b 0f 1f 00
[ 26.301236] RIP [<ffffffff81310dfb>] ext4_xattr_shift_entries+0x5b/0x60
[ 26.302068] RSP <ffff880339743cf8>
[ 26.302562] ---[ end trace cc18c7e6935b8a49 ]---
Filesystem checked clean the during boot before it was ENOSPCed.
Didn't check on reboot before this happened. After another reboot:
# e2fsck -f /dev/sda1
e2fsck 1.43-WIP (18-May-2015)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/sda1: 293892/624624 files (3.0% non-contiguous), 2496091/2496091 blocks
#
Filesystem claims it is clean, but it's still at ENOSPC.
remount as rw, as user run:
dave@test4:~$ rm -rf /mnt/scratch
rm: cannot remove ??/mnt/scratch/dir5/fname1??: Permission denied
rm: cannot remove ??/mnt/scratch/dir5/sd2??: Permission denied
rm: cannot remove ??/mnt/scratch/dir5/ed2??: Permission denied
.....
[ 182.524593] ------------[ cut here ]------------
[ 182.525295] kernel BUG at fs/ext4/xattr.c:1331!
[ 182.525906] invalid opcode: 0000 [#1] PREEMPT SMP
[ 182.526655] Modules linked in:
[ 182.527132] CPU: 0 PID: 4001 Comm: rm Not tainted 4.7.0-rc6-dgc+ #839
[ 182.528031] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Debian-1.8.2-1 04/01/2014
[ 182.529174] task: ffff88013a990000 ti: ffff880338f30000 task.ti: ffff880338f30000
[ 182.530278] RIP: 0010:[<ffffffff81310dfb>] [<ffffffff81310dfb>] ext4_xattr_shift_entries+0x5b/0x60
[ 182.531615] RSP: 0018:ffff880338f33cf8 EFLAGS: 00010202
[ 182.532313] RAX: 000000000030000e RBX: ffff8800baf43640 RCX: ffff88033060cb9c
[ 182.533379] RDX: 0000000000000000 RSI: 000000000000000c RDI: ffff88033060cba0
[ 182.534317] RBP: ffff880338f33cf8 R08: ffffffffffffffd0 R09: 0000000000001000
[ 182.535377] R10: 000000000000000e R11: ffff88033060cba0 R12: ffff88013a804000
[ 182.536297] R13: 0000000000000000 R14: 000000000000000a R15: ffff880327109ad0
[ 182.537348] FS: 00007f7513549700(0000) GS:ffff88013bc00000(0000) knlGS:0000000000000000
[ 182.538397] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 182.539286] CR2: 0000000000638088 CR3: 00000000bb216000 CR4: 00000000000006f0
[ 182.540208] Stack:
[ 182.540617] ffff880338f33de0 ffffffff8131326d 000000000000001c ffff8803271099f8
[ 182.541943] ffff8800bb6af9c0 0000000000000ebc ffff8800bb665000 ffff880338f33da0
[ 182.543317] ffff88033060cb00 0000000000000000 000000000000005e ffff88033060cba0
[ 182.544611] Call Trace:
[ 182.545008] [<ffffffff8131326d>] ext4_expand_extra_isize_ea+0x3ad/0x810
[ 182.546016] [<ffffffff812de361>] ? ext4_unlink+0x341/0x380
[ 182.546750] [<ffffffff812d0e6c>] ext4_mark_inode_dirty+0x1cc/0x230
[ 182.547699] [<ffffffff812de361>] ext4_unlink+0x341/0x380
[ 182.548426] [<ffffffff81203211>] vfs_unlink+0xf1/0x180
[ 182.549249] [<ffffffff81207839>] do_unlinkat+0x259/0x2d0
[ 182.549970] [<ffffffff812082ab>] SyS_unlinkat+0x1b/0x30
[ 182.550815] [<ffffffff81e3cc72>] entry_SYSCALL_64_fastpath+0x1a/0xa4
[ 182.551663] Code: 77 29 66 44 89 57 02 0f b6 07 48 83 c0 13 48 83 e0 fc 48 01 c7 8b 07 85 c0 75 c9 4c 89 c2 48 89 ce 4c 89 df e8 67 e8 4e 00 5d c3 <0f> 0b 0f 1f 00
[ 182.557820] RIP [<ffffffff81310dfb>] ext4_xattr_shift_entries+0x5b/0x60
[ 182.558849] RSP <ffff880338f33cf8>
[ 182.559476] ---[ end trace 84ae2f59660ff3c6 ]---
Not sure why it is trying to expand EA space in the inode on unlink,
but that's what it's trying to do and it's bugging out on it.
So, by pure chance, the third file I manually tried to remove:
$ ls -l /mnt/scratch/aligned_vector_rw
-rw------- 1 root root 104857600 Jul 18 11:08 aligned_vector_rw
$ sudo rm /mnt/scratch/aligned_vector_rw
[ 192.407586] ------------[ cut here ]------------
[ 192.408765] kernel BUG at fs/ext4/xattr.c:1331!
[ 192.409949] invalid opcode: 0000 [#1] PREEMPT SMP
[ 192.410976] Modules linked in:
[ 192.411691] CPU: 0 PID: 4521 Comm: rm Not tainted 4.7.0-rc6-dgc+ #839
[ 192.413083] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Debian-1.8.2-1 04/01/2014
[ 192.415011] task: ffff88023aa04780 ti: ffff880238cd4000 task.ti: ffff880238cd4000
[ 192.416624] RIP: 0010:[<ffffffff81310dfb>] [<ffffffff81310dfb>] ext4_xattr_shift_entries+0x5b/0x60
[ 192.418599] RSP: 0018:ffff880238cd7cf8 EFLAGS: 00010202
[ 192.419676] RAX: 000000000030000e RBX: ffff88013aa27e40 RCX: ffff8802391dbf9c
[ 192.421042] RDX: 0000000000000000 RSI: 000000000000000c RDI: ffff8802391dbfa0
[ 192.422386] RBP: ffff880238cd7cf8 R08: ffffffffffffffd0 R09: 0000000000001000
[ 192.423724] R10: 000000000000000e R11: ffff8802391dbfa0 R12: ffff88023b803b40
[ 192.425058] R13: 0000000000000000 R14: 000000000000000a R15: ffff880239672a30
[ 192.426407] FS: 00007fa04a4f5700(0000) GS:ffff88013bc00000(0000) knlGS:0000000000000000
[ 192.427915] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 192.428933] CR2: 00000000006120a8 CR3: 00000000bb2a6000 CR4: 00000000000006f0
[ 192.430219] Stack:
[ 192.430591] ffff880238cd7de0 ffffffff8131326d 000000000000001c ffff880239672958
[ 192.431975] ffff88023b25f7b8 0000000000000ebc ffff8800bb5dd000 ffff880238cd7da0
[ 192.433370] ffff8802391dbf00 0000000000000000 000000000000005e ffff8802391dbfa0
[ 192.434726] Call Trace:
[ 192.435152] [<ffffffff8131326d>] ext4_expand_extra_isize_ea+0x3ad/0x810
[ 192.436246] [<ffffffff812de361>] ? ext4_unlink+0x341/0x380
[ 192.437120] [<ffffffff812d0e6c>] ext4_mark_inode_dirty+0x1cc/0x230
[ 192.438113] [<ffffffff812de361>] ext4_unlink+0x341/0x380
[ 192.438961] [<ffffffff81203211>] vfs_unlink+0xf1/0x180
[ 192.439783] [<ffffffff81207839>] do_unlinkat+0x259/0x2d0
[ 192.440632] [<ffffffff812082ab>] SyS_unlinkat+0x1b/0x30
[ 192.441469] [<ffffffff81e3cc72>] entry_SYSCALL_64_fastpath+0x1a/0xa4
[ 192.442489] Code: 77 29 66 44 89 57 02 0f b6 07 48 83 c0 13 48 83 e0 fc 48 01 c7 8b 07 85 c0 75 c9 4c 89 c2 48 89 ce 4c 89 df e8 67 e8 4e 00 5d c3 <0f> 0b 0f 1f 00
[ 192.446750] RIP [<ffffffff81310dfb>] ext4_xattr_shift_entries+0x5b/0x60
[ 192.447760] RSP <ffff880238cd7cf8>
[ 192.448330] ---[ end trace 4c5fd2f472bea26f ]---
So I rebooted again, and immediately ran:
# rm /mnt/scratch/aligned_vector_rw
And it succeeded without oopsing. Yay? Then I tried again as root
to run 'rm /mnt/scratch/*' and it oopsed on some other file....
-Dave.
--
Dave Chinner
[email protected]
On Mon, Jul 18, 2016 at 12:23:56PM +1000, Dave Chinner wrote:
> Hi folks,
>
> Another ext3 foobar on 4.7-rc6. The test VM hung when the rootfs ran
> out of space. mountinfo:
>
> 16 0 8:1 / / rw,relatime shared:1 - ext3 /dev/root rw,errors=remount-ro,data=ordered
>
> After reboot, df:
>
> Filesystem 1K-blocks Used Available Use% Mounted on
> /dev/root 9696448 9696420 0 100% /
>
>
> I then ran:
>
> $ rm -rf /mnt/scratch
>
> to cleanup some mess left by xfstests. This returned huge numbers of
> EPERM errors (expected, as files were created by root), but then the
> rm -rf process segfaulted. On the console:
>
> [ 26.275026] ------------[ cut here ]------------
> [ 26.275672] kernel BUG at fs/ext4/xattr.c:1331!
> [ 26.276231] invalid opcode: 0000 [#1] PREEMPT SMP
> [ 26.276820] Modules linked in:
> [ 26.277226] CPU: 0 PID: 3127 Comm: rm Not tainted 4.7.0-rc6-dgc+ #839
> [ 26.278014] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Debian-1.8.2-1 04/01/2014
> [ 26.279103] task: ffff880336dda3c0 ti: ffff880339740000 task.ti: ffff880339740000
> [ 26.280033] RIP: 0010:[<ffffffff81310dfb>] [<ffffffff81310dfb>] ext4_xattr_shift_entries+0x5b/0x60
> [ 26.281165] RSP: 0018:ffff880339743cf8 EFLAGS: 00010202
> [ 26.281825] RAX: 000000000030000e RBX: ffff88013ab73740 RCX: ffff88013a295f9c
> [ 26.282708] RDX: 0000000000000000 RSI: 000000000000000c RDI: ffff88013a295fa0
> [ 26.283595] RBP: ffff880339743cf8 R08: ffffffffffffffd0 R09: 0000000000001000
> [ 26.284466] R10: 000000000000000e R11: ffff88013a295fa0 R12: ffff8800bae366c0
> [ 26.285335] R13: 0000000000000000 R14: 000000000000000a R15: ffff880139c895b0
> [ 26.286201] FS: 00007f1805169700(0000) GS:ffff88013bc00000(0000) knlGS:0000000000000000
> [ 26.287181] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 26.287890] CR2: 00007fffedd68f94 CR3: 00000000ba973000 CR4: 00000000000006f0
> [ 26.288757] Stack:
> [ 26.289015] ffff880339743de0 ffffffff8131326d 000000000000001c ffff880139c894d8
> [ 26.289974] ffff88013b6a54e0 0000000000000ebc ffff880000c02000 ffff880339743da0
> [ 26.290934] ffff88013a295f00 0000000000000000 000000000000005e ffff88013a295fa0
> [ 26.291901] Call Trace:
> [ 26.292216] [<ffffffff8131326d>] ext4_expand_extra_isize_ea+0x3ad/0x810
> [ 26.293033] [<ffffffff812de361>] ? ext4_unlink+0x341/0x380
> [ 26.293709] [<ffffffff812d0e6c>] ext4_mark_inode_dirty+0x1cc/0x230
> [ 26.294470] [<ffffffff812de361>] ext4_unlink+0x341/0x380
> [ 26.295126] [<ffffffff81203211>] vfs_unlink+0xf1/0x180
> [ 26.295783] [<ffffffff81207839>] do_unlinkat+0x259/0x2d0
> [ 26.296442] [<ffffffff812082ab>] SyS_unlinkat+0x1b/0x30
> [ 26.297096] [<ffffffff81e3cc72>] entry_SYSCALL_64_fastpath+0x1a/0xa4
> [ 26.297876] Code: 77 29 66 44 89 57 02 0f b6 07 48 83 c0 13 48 83 e0 fc 48 01 c7 8b 07 85 c0 75 c9 4c 89 c2 48 89 ce 4c 89 df e8 67 e8 4e 00 5d c3 <0f> 0b 0f 1f 00
> [ 26.301236] RIP [<ffffffff81310dfb>] ext4_xattr_shift_entries+0x5b/0x60
> [ 26.302068] RSP <ffff880339743cf8>
> [ 26.302562] ---[ end trace cc18c7e6935b8a49 ]---
>
> Filesystem checked clean the during boot before it was ENOSPCed.
> Didn't check on reboot before this happened. After another reboot:
>
> # e2fsck -f /dev/sda1
> e2fsck 1.43-WIP (18-May-2015)
> Pass 1: Checking inodes, blocks, and sizes
> Pass 2: Checking directory structure
> Pass 3: Checking directory connectivity
> Pass 4: Checking reference counts
> Pass 5: Checking group summary information
> /dev/sda1: 293892/624624 files (3.0% non-contiguous), 2496091/2496091 blocks
> #
>
> Filesystem claims it is clean, but it's still at ENOSPC.
Filesystem clean, no longer at enospc (1.7GB free), still see this
problem. narrowed it down to this set of files:
# ls -l /mnt/scratch
total 64
-rw-r--r-- 1 root root 45056 Nov 10 2015 fsr_test_file.27768.14.10
-rw-r--r-- 1 root root 49152 Nov 10 2015 fsr_test_file.27768.14.11
-rw-r--r-- 1 root root 53248 Nov 10 2015 fsr_test_file.27768.14.12
-rw-r--r-- 1 root root 57344 Nov 10 2015 fsr_test_file.27768.14.13
-rw-r--r-- 1 root root 61440 Nov 10 2015 fsr_test_file.27768.14.14
-rw-r--r-- 1 root root 65536 Nov 10 2015 fsr_test_file.27768.14.15
-rw-r--r-- 1 root root 69632 Nov 10 2015 fsr_test_file.27768.14.16
-rw-r--r-- 1 root root 73728 Nov 10 2015 fsr_test_file.27768.14.17
-rw-r--r-- 1 root root 77824 Nov 10 2015 fsr_test_file.27768.14.18
-rw-r--r-- 1 root root 81920 Nov 10 2015 fsr_test_file.27768.14.19
-rw-r--r-- 1 root root 86016 Nov 10 2015 fsr_test_file.27768.14.20
-rw-r--r-- 1 root root 24576 Nov 10 2015 fsr_test_file.27768.14.5
-rw-r--r-- 1 root root 28672 Nov 10 2015 fsr_test_file.27768.14.6
-rw-r--r-- 1 root root 32768 Nov 10 2015 fsr_test_file.27768.14.7
-rw-r--r-- 1 root root 36864 Nov 10 2015 fsr_test_file.27768.14.8
-rw-r--r-- 1 root root 40960 Nov 10 2015 fsr_test_file.27768.14.9
# rm -rf /mnt/scratch/fsr_test_file.27768.14.5
[ 399.018116] ------------[ cut here ]------------
[ 399.018916] kernel BUG at fs/ext4/xattr.c:1331!
[ 399.019567] invalid opcode: 0000 [#1] PREEMPT SMP
[ 399.020259] Modules linked in:
[ 399.020723] CPU: 12 PID: 5007 Comm: rm Not tainted 4.7.0-rc6-dgc+ #839
[ 399.021660] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Debian-1.8.2-1 04/01/2014
[ 399.022901] task: ffff88033aa28000 ti: ffff88033a90c000 task.ti: ffff88033a90c000
[ 399.023967] RIP: 0010:[<ffffffff81310dfb>] [<ffffffff81310dfb>] ext4_xattr_shift_entries+0x5b/0x60
[ 399.025248] RSP: 0018:ffff88033a90fcf8 EFLAGS: 00010202
[ 399.025937] RAX: 000000000030000e RBX: ffff88013aa13500 RCX: ffff88033764319c
[ 399.026878] RDX: 0000000000000000 RSI: 000000000000000c RDI: ffff8803376431a0
[ 399.027807] RBP: ffff88033a90fcf8 R08: ffffffffffffffd0 R09: 0000000000001000
[ 399.028750] R10: 000000000000000e R11: ffff8803376431a0 R12: ffff88023b7a25a0
[ 399.029693] R13: 0000000000000000 R14: 000000000000000a R15: ffff88013b54ce10
[ 399.030617] FS: 00007f741d174700(0000) GS:ffff88013bd80000(0000) knlGS:0000000000000000
[ 399.031713] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 399.032485] CR2: 00000000006111c8 CR3: 000000013ab14000 CR4: 00000000000006e0
[ 399.033431] Stack:
[ 399.033700] ffff88033a90fde0 ffffffff8131326d 000000000000001c ffff88013b54cd38
[ 399.034750] ffff8800b99541a0 0000000000000f04 ffff8800bb6e2000 ffff88033a90fda0
[ 399.035791] ffff880337643100 0000000000000000 000000000000005e ffff8803376431a0
[ 399.036853] Call Trace:
[ 399.037197] [<ffffffff8131326d>] ext4_expand_extra_isize_ea+0x3ad/0x810
[ 399.038091] [<ffffffff812de361>] ? ext4_unlink+0x341/0x380
[ 399.038836] [<ffffffff812d0e6c>] ext4_mark_inode_dirty+0x1cc/0x230
[ 399.039674] [<ffffffff812de361>] ext4_unlink+0x341/0x380
[ 399.040477] [<ffffffff81203211>] vfs_unlink+0xf1/0x180
[ 399.041263] [<ffffffff81207839>] do_unlinkat+0x259/0x2d0
[ 399.042079] [<ffffffff812082ab>] SyS_unlinkat+0x1b/0x30
[ 399.042837] [<ffffffff81e3cc72>] entry_SYSCALL_64_fastpath+0x1a/0xa4
[ 399.043698] Code: 77 29 66 44 89 57 02 0f b6 07 48 83 c0 13 48 83 e0 fc 48 01 c7 8b 07 85 c0 75 c9 4c 89 c2 48 89 ce 4c 89 df e8 67 e8 4e 00 5d c3 <0f> 0b 0f 1f 00
[ 399.047385] RIP [<ffffffff81310dfb>] ext4_xattr_shift_entries+0x5b/0x60
[ 399.048272] RSP <ffff88033a90fcf8>
[ 399.048772] ---[ end trace 1b8726b74fc862c0 ]---
Segmentation fault
#
So, what are those files? They are created by an xfs_fsr
defragmentation test that exercises different data and attribute
extent fork sizes - xfs/227 to be precise. An XFS change I was
testing caused the scratch filesystem to fail to mount (due to an
oops on unmount on a prior test), so it created the files on the
filesystem underlying the scratch mount point(*).
The "14.N" in the file name tells us that there are 14 "2 byte
attributes" created on the file, and N is the number of data extents
on the file. It would appear that creating 14x2 byte attributes and
5 or more data extents causes a corner case problem with ext3 that
isn't realised until the we attempt to unlink the files. I can't
remove any of the above files without triggering the BUG.
# getfattr -d -e hex /mnt/scratch/fsr_test_file.27768.14.6
getfattr: Removing leading '/' from absolute path names
# file: mnt/scratch/fsr_test_file.27768.14.6
user.0=0xbabe
user.1=0xbabe
user.10=0xbabe
user.11=0xbabe
user.12=0xbabe
user.13=0xbabe
user.14=0xbabe
user.2=0xbabe
user.3=0xbabe
user.4=0xbabe
user.5=0xbabe
user.6=0xbabe
user.7=0xbabe
user.8=0xbabe
user.9=0xbabe
#
# setfattr -x user.0 /mnt/scratch/fsr_test_file.27768.14.6
# getfattr -d -e hex /mnt/scratch/fsr_test_file.27768.14.6
getfattr: Removing leading '/' from absolute path names
# file: mnt/scratch/fsr_test_file.27768.14.6
user.1=0xbabe
user.10=0xbabe
user.11=0xbabe
user.12=0xbabe
user.13=0xbabe
user.14=0xbabe
user.2=0xbabe
user.3=0xbabe
user.4=0xbabe
user.5=0xbabe
user.6=0xbabe
user.7=0xbabe
user.8=0xbabe
user.9=0xbabe
# rm !$
rm /mnt/scratch/fsr_test_file.27768.14.6
#
And, by removing an attribute, I can successfully remove the file.
So this definitely looks like a corner case xattr handling issue in
ext3/4.
Cheers,
Dave.
(*) This is a prime example of why xfstests should not stop running tests when
it fails to mount a scratch filesystem. I'd have never hit this
problem if we prevented xfs/227 from running because something else
had already gone wrong....
--
Dave Chinner
[email protected]
On 7/17/16 8:02 PM, Dave Chinner wrote:
> # rm !$
> rm /mnt/scratch/fsr_test_file.27768.14.6
> #
>
> And, by removing an attribute, I can successfully remove the file.
> So this definitely looks like a corner case xattr handling issue in
> ext3/4.
I told xfs/227 that it could run on ext3 and ran it, but this
didn't reproduce for me.
Can you provide a dumpe2fs -h for the root fs, this might depend on
inode size etc.
Thanks,
-Eric
On Sun, Jul 17, 2016 at 09:07:16PM -0700, Eric Sandeen wrote:
> On 7/17/16 8:02 PM, Dave Chinner wrote:
> > # rm !$
> > rm /mnt/scratch/fsr_test_file.27768.14.6
> > #
> >
> > And, by removing an attribute, I can successfully remove the file.
> > So this definitely looks like a corner case xattr handling issue in
> > ext3/4.
>
> I told xfs/227 that it could run on ext3 and ran it, but this
> didn't reproduce for me.
>
> Can you provide a dumpe2fs -h for the root fs, this might depend on
> inode size etc.
# dumpe2fs -h /dev/sda1
dumpe2fs 1.43-WIP (18-May-2015)
Filesystem volume name: <none>
Last mounted on: /
Filesystem UUID: b21615e5-fe8a-4ffc-ab80-c24cdc8b740a
Filesystem magic number: 0xEF53
Filesystem revision #: 1 (dynamic)
Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery sparse_super large_file
Filesystem flags: signed_directory_hash
Default mount options: (none)
Filesystem state: clean
Errors behavior: Continue
Filesystem OS type: Linux
Inode count: 624624
Block count: 2496091
Reserved block count: 124804
Free blocks: 567319
Free inodes: 352653
First block: 0
Block size: 4096
Fragment size: 4096
Reserved GDT blocks: 609
Blocks per group: 32768
Fragments per group: 32768
Inodes per group: 8112
Inode blocks per group: 507
Filesystem created: Thu Mar 25 18:10:55 2010
Last mount time: Tue Jul 19 01:21:57 2016
Last write time: Tue Jul 19 01:21:57 2016
Mount count: 10
Maximum mount count: 27
Last checked: Mon Jul 18 21:59:01 2016
Check interval: 15552000 (6 months)
Next check after: Sat Jan 14 22:59:01 2017
Lifetime writes: 13 GB
Reserved blocks uid: 0 (user root)
Reserved blocks gid: 0 (group root)
First inode: 11
Inode size: 256
Required extra isize: 28
Desired extra isize: 28
Journal inode: 8
First orphan inode: 219355
Default directory hash: half_md4
Directory Hash Seed: 740ffa95-af8d-4e89-b68c-5e768a27ece3
Journal backup: inode blocks
Journal features: journal_incompat_revoke
Journal size: 128M
Journal length: 32768
Journal sequence: 0x01c975b5
Journal start: 12
-Dave.
--
Dave Chinner
[email protected]
On Mon 18-07-16 15:24:47, Dave Chinner wrote:
> On Sun, Jul 17, 2016 at 09:07:16PM -0700, Eric Sandeen wrote:
> > On 7/17/16 8:02 PM, Dave Chinner wrote:
> > > # rm !$
> > > rm /mnt/scratch/fsr_test_file.27768.14.6
> > > #
> > >
> > > And, by removing an attribute, I can successfully remove the file.
> > > So this definitely looks like a corner case xattr handling issue in
> > > ext3/4.
> >
> > I told xfs/227 that it could run on ext3 and ran it, but this
> > didn't reproduce for me.
> >
> > Can you provide a dumpe2fs -h for the root fs, this might depend on
> > inode size etc.
>
> # dumpe2fs -h /dev/sda1
> dumpe2fs 1.43-WIP (18-May-2015)
> Filesystem volume name: <none>
> Last mounted on: /
> Filesystem UUID: b21615e5-fe8a-4ffc-ab80-c24cdc8b740a
> Filesystem magic number: 0xEF53
> Filesystem revision #: 1 (dynamic)
> Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery sparse_super large_file
> Filesystem flags: signed_directory_hash
> Default mount options: (none)
> Filesystem state: clean
> Errors behavior: Continue
> Filesystem OS type: Linux
> Inode count: 624624
> Block count: 2496091
> Reserved block count: 124804
> Free blocks: 567319
> Free inodes: 352653
> First block: 0
> Block size: 4096
> Fragment size: 4096
> Reserved GDT blocks: 609
> Blocks per group: 32768
> Fragments per group: 32768
> Inodes per group: 8112
> Inode blocks per group: 507
> Filesystem created: Thu Mar 25 18:10:55 2010
> Last mount time: Tue Jul 19 01:21:57 2016
> Last write time: Tue Jul 19 01:21:57 2016
> Mount count: 10
> Maximum mount count: 27
> Last checked: Mon Jul 18 21:59:01 2016
> Check interval: 15552000 (6 months)
> Next check after: Sat Jan 14 22:59:01 2017
> Lifetime writes: 13 GB
> Reserved blocks uid: 0 (user root)
> Reserved blocks gid: 0 (group root)
> First inode: 11
> Inode size: 256
> Required extra isize: 28
> Desired extra isize: 28
> Journal inode: 8
> First orphan inode: 219355
> Default directory hash: half_md4
> Directory Hash Seed: 740ffa95-af8d-4e89-b68c-5e768a27ece3
> Journal backup: inode blocks
> Journal features: journal_incompat_revoke
> Journal size: 128M
> Journal length: 32768
> Journal sequence: 0x01c975b5
> Journal start: 12
Thanks for report! So I see at least part of what happened: Your filesystem
was created with 'extra inode size' 28 and likely your inodes were created
with this amount of space reserved in the extended attribute area of the
inode because you still created them with some older kernel (but that means
that it had to be a kernel prior to commit 8b4953e13f4c which landed in
4.4-rc5 because newer kernels would automatically reserve 32-bytes in the
inode, not 28 as specified by the superblock).
The above mentioned commit has added project ID to the inode so new kernels
now ask for 32 bytes in the extended attribute area. So when you tried to
modify the inode with newer kernel, we were trying to shift extended
attributes around to make space for those additional 4 bytes. So that makes
it clear why Eric was not able to reproduce the issue.
I've tried creating file with an old kernel and deleting it with a new one
and the bugon indeed triggers. Going through ext4_expand_extra_isize_ea() I
see so many bugs that it's not nice. I guess we should add some inode size
expansion tests...
Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR
On Thu, Jul 28, 2016 at 03:54:32PM +0200, Jan Kara wrote:
> On Mon 18-07-16 15:24:47, Dave Chinner wrote:
> > On Sun, Jul 17, 2016 at 09:07:16PM -0700, Eric Sandeen wrote:
> > > On 7/17/16 8:02 PM, Dave Chinner wrote:
> > > > # rm !$
> > > > rm /mnt/scratch/fsr_test_file.27768.14.6
> > > > #
> > > >
> > > > And, by removing an attribute, I can successfully remove the file.
> > > > So this definitely looks like a corner case xattr handling issue in
> > > > ext3/4.
> > >
> > > I told xfs/227 that it could run on ext3 and ran it, but this
> > > didn't reproduce for me.
> > >
> > > Can you provide a dumpe2fs -h for the root fs, this might depend on
> > > inode size etc.
> >
> > # dumpe2fs -h /dev/sda1
> > dumpe2fs 1.43-WIP (18-May-2015)
> > Filesystem volume name: <none>
> > Last mounted on: /
> > Filesystem UUID: b21615e5-fe8a-4ffc-ab80-c24cdc8b740a
> > Filesystem magic number: 0xEF53
> > Filesystem revision #: 1 (dynamic)
> > Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery sparse_super large_file
> > Filesystem flags: signed_directory_hash
> > Default mount options: (none)
> > Filesystem state: clean
> > Errors behavior: Continue
> > Filesystem OS type: Linux
> > Inode count: 624624
> > Block count: 2496091
> > Reserved block count: 124804
> > Free blocks: 567319
> > Free inodes: 352653
> > First block: 0
> > Block size: 4096
> > Fragment size: 4096
> > Reserved GDT blocks: 609
> > Blocks per group: 32768
> > Fragments per group: 32768
> > Inodes per group: 8112
> > Inode blocks per group: 507
> > Filesystem created: Thu Mar 25 18:10:55 2010
> > Last mount time: Tue Jul 19 01:21:57 2016
> > Last write time: Tue Jul 19 01:21:57 2016
> > Mount count: 10
> > Maximum mount count: 27
> > Last checked: Mon Jul 18 21:59:01 2016
> > Check interval: 15552000 (6 months)
> > Next check after: Sat Jan 14 22:59:01 2017
> > Lifetime writes: 13 GB
> > Reserved blocks uid: 0 (user root)
> > Reserved blocks gid: 0 (group root)
> > First inode: 11
> > Inode size: 256
> > Required extra isize: 28
> > Desired extra isize: 28
> > Journal inode: 8
> > First orphan inode: 219355
> > Default directory hash: half_md4
> > Directory Hash Seed: 740ffa95-af8d-4e89-b68c-5e768a27ece3
> > Journal backup: inode blocks
> > Journal features: journal_incompat_revoke
> > Journal size: 128M
> > Journal length: 32768
> > Journal sequence: 0x01c975b5
> > Journal start: 12
>
> Thanks for report! So I see at least part of what happened: Your filesystem
> was created with 'extra inode size' 28 and likely your inodes were created
> with this amount of space reserved in the extended attribute area of the
> inode because you still created them with some older kernel (but that means
> that it had to be a kernel prior to commit 8b4953e13f4c which landed in
> 4.4-rc5 because newer kernels would automatically reserve 32-bytes in the
> inode, not 28 as specified by the superblock).
Well, yes, the filesystems were made prior to 4.4.-rc5. Only by a
little - it was made back in January 2010 and has been in use ever
since. :P
> The above mentioned commit has added project ID to the inode so new kernels
> now ask for 32 bytes in the extended attribute area. So when you tried to
> modify the inode with newer kernel, we were trying to shift extended
> attributes around to make space for those additional 4 bytes. So that makes
> it clear why Eric was not able to reproduce the issue.
Gotcha.
> I've tried creating file with an old kernel and deleting it with a new one
> and the bugon indeed triggers. Going through ext4_expand_extra_isize_ea() I
> see so many bugs that it's not nice. I guess we should add some inode size
> expansion tests...
Ouch. At least the problem is understood now - any idea on how long
it might take to fix?
Cheers,
Dave.
--
Dave Chinner
[email protected]
On Fri 29-07-16 10:21:12, Dave Chinner wrote:
> On Thu, Jul 28, 2016 at 03:54:32PM +0200, Jan Kara wrote:
> > On Mon 18-07-16 15:24:47, Dave Chinner wrote:
> > > On Sun, Jul 17, 2016 at 09:07:16PM -0700, Eric Sandeen wrote:
> > > > On 7/17/16 8:02 PM, Dave Chinner wrote:
> > > > > # rm !$
> > > > > rm /mnt/scratch/fsr_test_file.27768.14.6
> > > > > #
> > > > >
> > > > > And, by removing an attribute, I can successfully remove the file.
> > > > > So this definitely looks like a corner case xattr handling issue in
> > > > > ext3/4.
> > > >
> > > > I told xfs/227 that it could run on ext3 and ran it, but this
> > > > didn't reproduce for me.
> > > >
> > > > Can you provide a dumpe2fs -h for the root fs, this might depend on
> > > > inode size etc.
> > >
> > > # dumpe2fs -h /dev/sda1
> > > dumpe2fs 1.43-WIP (18-May-2015)
> > > Filesystem volume name: <none>
> > > Last mounted on: /
> > > Filesystem UUID: b21615e5-fe8a-4ffc-ab80-c24cdc8b740a
> > > Filesystem magic number: 0xEF53
> > > Filesystem revision #: 1 (dynamic)
> > > Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery sparse_super large_file
> > > Filesystem flags: signed_directory_hash
> > > Default mount options: (none)
> > > Filesystem state: clean
> > > Errors behavior: Continue
> > > Filesystem OS type: Linux
> > > Inode count: 624624
> > > Block count: 2496091
> > > Reserved block count: 124804
> > > Free blocks: 567319
> > > Free inodes: 352653
> > > First block: 0
> > > Block size: 4096
> > > Fragment size: 4096
> > > Reserved GDT blocks: 609
> > > Blocks per group: 32768
> > > Fragments per group: 32768
> > > Inodes per group: 8112
> > > Inode blocks per group: 507
> > > Filesystem created: Thu Mar 25 18:10:55 2010
> > > Last mount time: Tue Jul 19 01:21:57 2016
> > > Last write time: Tue Jul 19 01:21:57 2016
> > > Mount count: 10
> > > Maximum mount count: 27
> > > Last checked: Mon Jul 18 21:59:01 2016
> > > Check interval: 15552000 (6 months)
> > > Next check after: Sat Jan 14 22:59:01 2017
> > > Lifetime writes: 13 GB
> > > Reserved blocks uid: 0 (user root)
> > > Reserved blocks gid: 0 (group root)
> > > First inode: 11
> > > Inode size: 256
> > > Required extra isize: 28
> > > Desired extra isize: 28
> > > Journal inode: 8
> > > First orphan inode: 219355
> > > Default directory hash: half_md4
> > > Directory Hash Seed: 740ffa95-af8d-4e89-b68c-5e768a27ece3
> > > Journal backup: inode blocks
> > > Journal features: journal_incompat_revoke
> > > Journal size: 128M
> > > Journal length: 32768
> > > Journal sequence: 0x01c975b5
> > > Journal start: 12
> >
> > Thanks for report! So I see at least part of what happened: Your filesystem
> > was created with 'extra inode size' 28 and likely your inodes were created
> > with this amount of space reserved in the extended attribute area of the
> > inode because you still created them with some older kernel (but that means
> > that it had to be a kernel prior to commit 8b4953e13f4c which landed in
> > 4.4-rc5 because newer kernels would automatically reserve 32-bytes in the
> > inode, not 28 as specified by the superblock).
>
> Well, yes, the filesystems were made prior to 4.4.-rc5. Only by a
> little - it was made back in January 2010 and has been in use ever
> since. :P
>
> > The above mentioned commit has added project ID to the inode so new kernels
> > now ask for 32 bytes in the extended attribute area. So when you tried to
> > modify the inode with newer kernel, we were trying to shift extended
> > attributes around to make space for those additional 4 bytes. So that makes
> > it clear why Eric was not able to reproduce the issue.
>
> Gotcha.
>
> > I've tried creating file with an old kernel and deleting it with a new one
> > and the bugon indeed triggers. Going through ext4_expand_extra_isize_ea() I
> > see so many bugs that it's not nice. I guess we should add some inode size
> > expansion tests...
>
> Ouch. At least the problem is understood now - any idea on how long
> it might take to fix?
I have written some fixes, working on testing them... So hopefully I can
submit them later today or next week.
Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR
On Fri, Jul 29, 2016 at 08:42:10AM +0200, Jan Kara wrote:
>
> I have written some fixes, working on testing them... So hopefully I can
> submit them later today or next week.
How complicated are the fixes? This should work as a temporary
workaround and it going to be very simple to backport.
- Ted
commit e92d5b1cf6af642bc5018562e58044fd771f82bf
Author: Theodore Ts'o <[email protected]>
Date: Sun Jul 31 23:08:28 2016 -0400
ext4: suppress inode growth for file systems w/o project quota
We have not added a new "extra" inode field in a very long time, and
the code to deal with moving the extended attributes down to make room
for the extra inode fields has bitrotted and can cause kernel BUGS.
Very few people will likely use project quotas (and those that do will
likely be creating a new file system from scratch) so as a temporary
pallative until we are sure the inode expansion code is all sane,
let's not try to expand the inode "extra" fields if the project
feature has not been enabled.
Cc: [email protected] # v4.4+
Signed-off-by: Theodore Ts'o <[email protected]>
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index c13a4e4..e2622ba 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -3968,6 +3968,17 @@ no_journal:
if (sbi->s_inode_size > EXT4_GOOD_OLD_INODE_SIZE) {
sbi->s_want_extra_isize = sizeof(struct ext4_inode) -
EXT4_GOOD_OLD_INODE_SIZE;
+ /*
+ * This is a temporary hack; we have not added a new
+ * "extra" inode field in a long time, and the code to
+ * expand the inode structure has bitrotted and can
+ * cause kernel BUG's. If the file system does not
+ * have the project feature, let's not try to expand
+ * the inode's extra size as a temporary pallitive
+ * until we can fix up the inode expansion code.
+ */
+ if (!ext4_has_feature_project(sb))
+ sbi->s_want_extra_isize -= sizeof(__le32);
if (ext4_has_feature_extra_isize(sb)) {
if (sbi->s_want_extra_isize <
le16_to_cpu(es->s_want_extra_isize))
On Sun 31-07-16 23:09:09, Ted Tso wrote:
> On Fri, Jul 29, 2016 at 08:42:10AM +0200, Jan Kara wrote:
> >
> > I have written some fixes, working on testing them... So hopefully I can
> > submit them later today or next week.
>
> How complicated are the fixes? This should work as a temporary
> workaround and it going to be very simple to backport.
Relatively simple but there are still some bugs lurking - I have the
original issue fixed but I have implemented xfstests which stress the inode
expansion code in various ways and so far they are still able to crash the
kernel... Tomorrow I will be looking more into it and hopefully will be
able to finish the fixes.
Regarding your patch - I think it's a reasonable band-aid although I'll
have to backport proper fixes to 4.4 anyway (it's a base of our next
enterprise release), which is where i_projid got introduced as well, so
eventually we will have proper stable fixes for all relevant kernels
anyway.
Honza
> commit e92d5b1cf6af642bc5018562e58044fd771f82bf
> Author: Theodore Ts'o <[email protected]>
> Date: Sun Jul 31 23:08:28 2016 -0400
>
> ext4: suppress inode growth for file systems w/o project quota
>
> We have not added a new "extra" inode field in a very long time, and
> the code to deal with moving the extended attributes down to make room
> for the extra inode fields has bitrotted and can cause kernel BUGS.
>
> Very few people will likely use project quotas (and those that do will
> likely be creating a new file system from scratch) so as a temporary
> pallative until we are sure the inode expansion code is all sane,
> let's not try to expand the inode "extra" fields if the project
> feature has not been enabled.
>
> Cc: [email protected] # v4.4+
> Signed-off-by: Theodore Ts'o <[email protected]>
>
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index c13a4e4..e2622ba 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -3968,6 +3968,17 @@ no_journal:
> if (sbi->s_inode_size > EXT4_GOOD_OLD_INODE_SIZE) {
> sbi->s_want_extra_isize = sizeof(struct ext4_inode) -
> EXT4_GOOD_OLD_INODE_SIZE;
> + /*
> + * This is a temporary hack; we have not added a new
> + * "extra" inode field in a long time, and the code to
> + * expand the inode structure has bitrotted and can
> + * cause kernel BUG's. If the file system does not
> + * have the project feature, let's not try to expand
> + * the inode's extra size as a temporary pallitive
> + * until we can fix up the inode expansion code.
> + */
> + if (!ext4_has_feature_project(sb))
> + sbi->s_want_extra_isize -= sizeof(__le32);
> if (ext4_has_feature_extra_isize(sb)) {
> if (sbi->s_want_extra_isize <
> le16_to_cpu(es->s_want_extra_isize))
--
Jan Kara <[email protected]>
SUSE Labs, CR
On Mon, Aug 01, 2016 at 02:19:30PM +0200, Jan Kara wrote:
>
> Relatively simple but there are still some bugs lurking - I have the
> original issue fixed but I have implemented xfstests which stress the inode
> expansion code in various ways and so far they are still able to crash the
> kernel... Tomorrow I will be looking more into it and hopefully will be
> able to finish the fixes.
Great, thanks. I'll wait and see how your patches work out. If we
think all of the patches are simple/low-risk enough for -stable
material, I'd much rather go with a proper fix.
> Regarding your patch - I think it's a reasonable band-aid although I'll
> have to backport proper fixes to 4.4 anyway (it's a base of our next
> enterprise release), which is where i_projid got introduced as well, so
> eventually we will have proper stable fixes for all relevant kernels
> anyway.
Well, it depends on whether you intend to support upgrading an
existing file system or require users to run mkfs.ext4 before being
able to take advantage of the new feature. Red Hat is much more
conservative and has historically only supported running mke2fs from
scratch --- which probably made sense from their QA department's POV,
anyway. The band-aid patch is enough if you don't suppor "tune2fs -O
project", but only "mke2fs -t ext4 -O project". I'd personally prefer
upstream to support both flawlesssly, so I'm thankful for your work to
fix things properly, but enterprise distros can be extremely
conservative.
Cheers,
- Ted
On Mon 01-08-16 09:09:05, Ted Tso wrote:
> On Mon, Aug 01, 2016 at 02:19:30PM +0200, Jan Kara wrote:
> > Regarding your patch - I think it's a reasonable band-aid although I'll
> > have to backport proper fixes to 4.4 anyway (it's a base of our next
> > enterprise release), which is where i_projid got introduced as well, so
> > eventually we will have proper stable fixes for all relevant kernels
> > anyway.
>
> Well, it depends on whether you intend to support upgrading an
> existing file system or require users to run mkfs.ext4 before being
> able to take advantage of the new feature. Red Hat is much more
> conservative and has historically only supported running mke2fs from
> scratch --- which probably made sense from their QA department's POV,
> anyway. The band-aid patch is enough if you don't suppor "tune2fs -O
> project", but only "mke2fs -t ext4 -O project". I'd personally prefer
> upstream to support both flawlesssly, so I'm thankful for your work to
> fix things properly, but enterprise distros can be extremely
> conservative.
Well, in SUSE we were initially very conservative about ext4 but as ext4
has matured, we are not that cautious anymore. So e.g. turning on project
quota feature would be something we'd like to support unless there is a
really good reason not to...
Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR