Hello.
In my board, i ran the iozone with multi-thread option.
My board has 8 cores and i enabled CONFIG_SMP.
the iozone command as follow: iozone -l 20 -u 20 -r 64k -s 5m -o -F /user/f1 /user/f2 /user/f3 /user/f4 /user/f5 /user/f6 /user/f7 /user/f8 /user/f9 /user/f10 /user/f11 /user/f12 /user/f13 /user/f14 /user/f15 /user/f16 /user/f17 /user/f18 /user/f19 /user/f20
mount info as follow
cat /proc/mounts
/dev/sda3 /user ext4 rw,noatime,nodiratime,errors=panic,barrier=1,data=ordered 0 0
I got a message as below every time i ran iozone test.
[ 4876.293124] [<ffffffff80272d18>] show_stack+0x68/0x80
[ 4876.309411] [<ffffffff802bde4c>] notifier_call_chain+0x5c/0xa8
[ 4876.315245] [<ffffffff802be524>] __atomic_notifier_call_chain+0x3c/0x58
[ 4876.321860] [<ffffffff802be588>] notify_die+0x38/0x48
[ 4876.326913] [<ffffffff80272548>] do_trap_or_bp+0x48/0x1a8
[ 4876.332312] [<ffffffff8026cfe4>] resume_userspace_check+0x0/0x10
[ 4876.338323] [<ffffffff80462994>] jbd2_journal_put_journal_head+0xcc/0x250
[ 4876.345111] [<ffffffff804603b4>] __jbd2_journal_remove_checkpoint+0x54/0x130
[ 4876.352160] [<ffffffff8045e630>] jbd2_journal_commit_transaction+0x1318/0x1ad0
[ 4876.359383] [<ffffffff80463f4c>] kjournald2+0x114/0x450
[ 4876.364611] [<ffffffff802b8160>] kthread+0xb8/0xc0
[ 4876.369402] [<ffffffff8026d060>] ret_from_kernel_thread+0x10/0x18
When i ran without multi-thread option, the problem was not ocurred.
Thanks.
On Thu, May 09, 2013 at 07:59:30AM +0000, EUNBONG SONG wrote:
>
> I got a message as below every time i ran iozone test.
>
>
> [ 4876.293124] [<ffffffff80272d18>] show_stack+0x68/0x80
> [ 4876.309411] [<ffffffff802bde4c>] notifier_call_chain+0x5c/0xa8
> [ 4876.315245] [<ffffffff802be524>] __atomic_notifier_call_chain+0x3c/0x58
> [ 4876.321860] [<ffffffff802be588>] notify_die+0x38/0x48
> [ 4876.326913] [<ffffffff80272548>] do_trap_or_bp+0x48/0x1a8
> [ 4876.332312] [<ffffffff8026cfe4>] resume_userspace_check+0x0/0x10
> [ 4876.338323] [<ffffffff80462994>] jbd2_journal_put_journal_head+0xcc/0x250
> [ 4876.345111] [<ffffffff804603b4>] __jbd2_journal_remove_checkpoint+0x54/0x130
> [ 4876.352160] [<ffffffff8045e630>] jbd2_journal_commit_transaction+0x1318/0x1ad0
> [ 4876.359383] [<ffffffff80463f4c>] kjournald2+0x114/0x450
> [ 4876.364611] [<ffffffff802b8160>] kthread+0xb8/0xc0
> [ 4876.369402] [<ffffffff8026d060>] ret_from_kernel_thread+0x10/0x18
>
> When i ran without multi-thread option, the problem was not ocurred.
Can you give us the full crash message, (i.e., the panic, the BUG,
WARN, the registers, etc.), and not the stack trace?
- Ted
> Can you give us the full crash message, (i.e., the panic, the BUG,
> WARN, the registers, etc.), and not the stack trace?
> - Ted
Hi, Ted
Actually i try to find the crash point. And i confirmed crash point is in __journal_remove_journal_head() function.
I added some debug code and i found J_ASSERT_JH is failed for jh->b_transaction.
My source tree has some modifications only for MIPS architecture. I don't think it does not affect to ext4 operation.
Also I confirmed the problem is not reproduced before merge 149b306089b88e186942a8d6647028ae6683aaf9.
149b306089b88e186942a8d6647028ae6683aaf9 Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
My full crash messages are as follows.
Iozone: Performance Test of File I/O
Version $Revision: 3.397 $
Compiled for 64 bit mode.
Build: linux-powerpc64
Contributors:William Norcott, Don Capps, Isom Crawford, Kirby Collins
Al Slater, Scott Rhine, Mike Wisner, Ken Goss
Steve Landherr, Brad Smith, Mark Kelly, Dr. Alain CYR,
Randy Dunlap, Mark Montague, Dan Million, Gavin Brebner,
Jean-Marc Zucconi, Jeff Blomberg, Benny Halevy, Dave Boone,
Erik Habbinga, Kris Strecker, Walter Wong, Joshua Roo[ 233.458766] CPU: 3 PID: 1535 Comm: iozone Not tainted 3.9.0+ #81
t,
Fabrice Bacchella, Zhenghua Xue, Qin Li, Darre[ 233.470132] Stack :n Sawyer.
Ben England.
Run began: Sun Dec 10 ffffffff8101143808:38:24 2000
Record Size 64 KB
File size set to 5120 KB
a80000008b64b000 SYNC Mode.
Command line used: iozone -l 20 -u 20 -r 64k -s 5 0000000000000003m -o -F /user/f1 /user/f2 /user/f3 /user/f4 /user/f5 /user/f6 /u ffffffff80292470ser/f7 /user/f8 /user/f9 /user/f10 /user/f11 /user/f12 /user/f13
/user/f14 /user/f15 /user/f16 /user/f17 /user/f18 /user/f19 /us 0000000000000000er/f20
Output is in Kbytes/sec
Time Resolution = 0.000001 se ffffffff80fa0000conds.
Processor cache size set to 1024 Kbytes.
Processor ca 000000000000001fche line size set to 32 bytes.
File stride size set to 17 * re ffffffff80293728cord size.
Min process = 20
Max process = 20
Throughput
test with 20 processes
Each process writes a 5120 Kbyte file i 0000000000000000n 64 Kbyte records
0000000000000000 ffffffff81080000 ffffffff81080000
ffffffff80e2abf0 ffffffff80f8f9f7 a8000002017a0db8 0000000000000020
0000000000000003 0000000000000020 a80000020025f968 ffffffff810f0000
a80000020025f770 ffffffff806ee88c a80000020025f588 ffffffff80290994
000000007ef10087 ffffffff80293b58 000000000000000a ffffffff80e2abf0
0000000000000003 a80000020025f4b0 00000002017a10f8 ffffffff805e68b4
0000000000000000 0000000000000000 0000000000000000 0000000000000000
0000000000000000 ffffffff80272418 0000000000000000 0000000000000000
...
[ 233.604041] Call Trace:
[ 233.606495] [<ffffffff80272418>] show_stack+0x68/0x80
[ 233.611550] [<ffffffff805e68b4>] cdr_event_handler+0x604/0xbf8
[ 233.617384] [<ffffffff805e7648>] cdr_event_die+0xd0/0x150
[ 233.622784] [<ffffffff802bd42c>] notifier_call_chain+0x5c/0xa8
[ 233.628619] [<ffffffff802bdb04>] __atomic_notifier_call_chain+0x3c/0x58
[ 233.635233] [<ffffffff802bdb68>] notify_die+0x38/0x48
[ 233.640285] [<ffffffff80271c48>] do_trap_or_bp+0x48/0x1a8
[ 233.645684] [<ffffffff8026c764>] resume_userspace_check+0x0/0x10
[ 233.651695] [<ffffffff80460b64>] jbd2_journal_put_journal_head+0xcc/0x250
[ 233.658484] [<ffffffff8045a1b4>] jbd2_journal_get_write_access+0x3c/0x58
[ 233.665188] [<ffffffff804348a8>] __ext4_journal_get_write_access+0x58/0xa0
[ 233.672064] [<ffffffff80410344>] ext4_reserve_inode_write+0x84/0xb0
[ 233.678331] [<ffffffff804103ac>] ext4_mark_inode_dirty+0x3c/0x1e0
[ 233.684424] [<ffffffff80410590>] ext4_dirty_inode+0x40/0x70
[ 233.689998] [<ffffffff80392258>] __mark_inode_dirty+0x48/0x238
[ 233.695832] [<ffffffff803828f4>] update_time+0xb4/0x100
[ 233.701058] [<ffffffff803829f0>] file_update_time+0xb0/0x108
[ 233.706718] [<ffffffff8031eb98>] __generic_file_aio_write+0x180/0x380
[ 233.713158] [<ffffffff8031edf8>] generic_file_aio_write+0x60/0xc0
[ 233.719252] [<ffffffff8040af54>] ext4_file_write+0x6c/0x468
[ 233.724827] [<ffffffff80366cbc>] do_sync_write+0x84/0xe8
[ 233.730139] [<ffffffff80368700>] vfs_write+0xe0/0x1e0
[ 233.735191] [<ffffffff803688f8>] SyS_write+0x50/0xc0
[ 233.740157] [<ffffffff80274864>] handle_sys64+0x44/0x64
Thanks.
I think I have the same (or highly similar) thing happening on ia64.
Similarities: seeing assertions fail for b_transaction
Differences: I only have ext3 filesystems mounted, no ext4
See attached trace. I'm pretty certain that the highly unhelpful
bugcheck! 0 [1]
comes from the
J_ASSERT_JH(jh, jh->b_transaction == NULL);
from disassembling __journal_remove_journal_head(). The instruction
pointer <a0000001003d6690> points to the 2nd "break" instruction
in the function.
The problem shows up after 30 minutes to a couple of hours of stress (kernel
builds with "make -j32").
I'm pretty sure this problem didn't occur in plain v3.9 (it can run for
a full 24 hours).
Trying to bisect - but it takes a while to be convinced that a good kernel
is actually good (since I don't have a clear picture of how long to run
before deciding that the bug isn't going to show)
-Tony
Hmm, since you seem to be able to reproduce the problem reliably, any
chance you can try bisecting the problem? I've looked at the commits
that touch fs/jbd2 and nothing is jumping out at me.
Also, how many CPU's do you have your system, and what kind of storage
device were you using when you were running iozone (5400rpm HDD,
7200RPM HDD, RAID array, SSD, etc.)?
Thanks,
- Ted
On 05/10/2013 12:27 PM, Theodore Ts'o wrote:
> Hmm, since you seem to be able to reproduce the problem reliably, any
> chance you can try bisecting the problem? I've looked at the commits
> that touch fs/jbd2 and nothing is jumping out at me.
>
> Also, how many CPU's do you have your system, and what kind of storage
> device were you using when you were running iozone (5400rpm HDD,
> 7200RPM HDD, RAID array, SSD, etc.)?
I too have seen this.
My system is:
12 CPU Octeon (MIPS64)
Root is ext3, mounted via ext4fs on a slow CompactFlash/PIO6
I saw the crash when simply booting a fairly bare-bones Debian distro,
although it is somewhat random:
.
.
ata3: PATA max PIO6 cmd 900000001d040000 ctl 900000001d05000d irq 63
.
.
ata3.00: CFA: CF 4GB, 20101001, max UDMA/133
ata3.00: 7847280 sectors, multi 0: LBA
ata3.00: configured for PIO6
.
.
scsi 2:0:0:0: Direct-Access ATA CF 4GB 2010 PQ: 0 ANSI: 5
sd 2:0:0:0: [sda] 7847280 512-byte logical blocks: (4.01 GB/3.74 GiB)
sd 2:0:0:0: [sda] Write Protect is off
sd 2:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't
support DPO or FUA
sda: sda1 sda2
sd 2:0:0:0: [sda] Attached SCSI disk
EXT4-fs (sda2): mounting ext3 file system using the ext4 subsystem
EXT4-fs (sda2): mounted filesystem with ordered data mode. Opts: (null)
VFS: Mounted root (ext3 filesystem) readonly on device 8:2.
.
.
.
I have not been able to get it to fail a second time. So for me
bisecting might not work.
David Daney
>
> Thanks,
>
> - Ted
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
>
On Fri, 10 May 2013 10:27:58 -0700, Tony Luck <[email protected]> wrote:
Non-text part: multipart/mixed
> I think I have the same (or highly similar) thing happening on ia64.
What was page_size and fsblock size?
>
> Similarities: seeing assertions fail for b_transaction
> Differences: I only have ext3 filesystems mounted, no ext4
>
> See attached trace. I'm pretty certain that the highly unhelpful
>
> bugcheck! 0 [1]
>
> comes from the
>
> J_ASSERT_JH(jh, jh->b_transaction == NULL);
>
> from disassembling __journal_remove_journal_head(). The instruction
> pointer <a0000001003d6690> points to the 2nd "break" instruction
> in the function.
>
> The problem shows up after 30 minutes to a couple of hours of stress (kernel
> builds with "make -j32").
I cant reproduce this one yet.
But changes {ext3,jbd} are minimal
#git log --oneline v3.9.. fs/{ext3,jbd}
5af43c2 Merge branch 'akpm' (incoming from Andrew)
a27bb33 aio: don't include aio.h in sched.h
4385bab make blkdev_put() return void
14a9e5c Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
e760040 fs/buffer.c: remove unnecessary init operation after allocating buffer_head.
713685111 mm: make snapshotting pages for stable writes a per-bio
operation
8bb9da9 jbd: use kmem_cache_zalloc for allocating journal head
e162b2f jbd: use kmem_cache_zalloc instead of kmem_cache_alloc/memset
e678a4f jbd: don't wait (forever) for stale tid caused by wraparound
e643692 ext3: fix data=journal fast mount/umount hang
So looks very strange..
I have ia64 and now I work on reproduction.
>
> I'm pretty sure this problem didn't occur in plain v3.9 (it can run for
> a full 24 hours).
>
> Trying to bisect - but it takes a while to be convinced that a good kernel
> is actually good (since I don't have a clear picture of how long to run
> before deciding that the bug isn't going to show)
>
> -Tony
Attachment: bug (application/octet-stream)
On Fri, 10 May 2013 15:27:47 -0400, Theodore Ts'o <[email protected]> wrote:
> Hmm, since you seem to be able to reproduce the problem reliably, any
> chance you can try bisecting the problem? I've looked at the commits
> that touch fs/jbd2 and nothing is jumping out at me.
>
> Also, how many CPU's do you have your system, and what kind of storage
> device were you using when you were running iozone (5400rpm HDD,
> 7200RPM HDD, RAID array, SSD, etc.)?
XFS TOO....
It it definitely just an ext3/4 related issue.
I've run xfstests on xfs and almost immediately have got slub corruption
I use following HEAD: 2dbd3cac87250a0d44e07acc86c4224a08522709
2013-05-11 11:59:30 Slab corruption (Not tainted): xfs_efi_item
start=ffff8802335063f0, len=400^M
2013-05-11 11:59:30 Redzone: 0x9f911029d74e35b/0x9f911029d74e35b.^M
2013-05-11 11:59:30 Last user:
[<ffffffffa03ba36f>](xfs_efi_item_free+0x3f/0x50 [xfs])^M
2013-05-11 11:59:30 070: 6a 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
jkkkkkkkkkkkkkkk^M
2013-05-11 11:59:30 Single bit error detected. Probably bad RAM.^M
2013-05-11 11:59:30 Run memtest86+ or a similar memory test tool.^M
2013-05-11 11:59:30 Prev obj: start=ffff880233506248, len=400^M
2013-05-11 11:59:30 Redzone: 0xd84156c5635688c0/0xd84156c5635688c0.^M
2013-05-11 11:59:30 Last user:
[<ffffffffa034d7bb>](kmem_zone_alloc+0xbb/0x190 [xfs])^M
2013-05-11 11:59:30 000: 48 62 50 33 02 88 ff ff 48 62 50 33 02 88 ff ff
HbP3....HbP3....^M
2013-05-11 11:59:30 010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
................^M
2013-05-11 11:59:30 Next obj: start=ffff880233506598, len=400^M
2013-05-11 11:59:30 Redzone: 0x9f911029d74e35b/0x9f911029d74e35b.^M
>
> Thanks,
>
> - Ted
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, 11 May 2013 13:17:38 +0400, Dmitry Monakhov <[email protected]> wrote:
Non-text part: multipart/mixed
> On Sat, 11 May 2013 12:13:20 +0400, Dmitry Monakhov <[email protected]> wrote:
> > On Fri, 10 May 2013 15:27:47 -0400, Theodore Ts'o <[email protected]> wrote:
> > > Hmm, since you seem to be able to reproduce the problem reliably, any
> > > chance you can try bisecting the problem? I've looked at the commits
> > > that touch fs/jbd2 and nothing is jumping out at me.
> > >
> > > Also, how many CPU's do you have your system, and what kind of storage
> > > device were you using when you were running iozone (5400rpm HDD,
> > > 7200RPM HDD, RAID array, SSD, etc.)?
> Ok, I've able to reproduce corruption on ext4
> So at this moment we have:
> Slub corruption on XFS testcase: xfstests/generic/013
> Slub corruption on EXT4 testcase: xfstests/generic/299
I've bisected ext4 related issue. It is appeared that it is pure ext4
specific. Regression caused by following commit
commit 4eec708d263f0ee10861d69251708a225b64cac7
Author: Jan Kara <[email protected]>
Date: Thu Apr 11 23:56:53 2013 -0400
ext4: use io_end for multiple bios
TESTCASE: xfstests generic/299
>
> In fact both test cases (069'th and 299'th) are just stress tests.
> So this is likely a regression in mm layer. I try to bisect it now.
>
>
> #Testcase: xfstests generic/299
> #DMESG
> ------------[ cut here ]------------
> WARNING: at fs/ext4/inode.c:3223 ext4_ext_direct_IO+0x2cb/0x3c0()
> Modules linked in: cpufreq_ondemand usb_storage acpi_cpufreq freq_table mperf coretemp kvm_intel kvm crc32c_intel ghash_clmulni_intel microcode sg xhci_hcd button sd_mod crc_t10dif aesni_intel ablk_helper cryptd lrw gf128mul glue_helper aes_x86_64 ahci libahci pata_acpi ata_generic dm_mirror dm_region_hash dm_log dm_mod
> CPU: 3 PID: 30537 Comm: fio Not tainted 3.9.0+ #14
> Hardware name: /DQ67SW, BIOS SWQ6710H.86A.0052.2011.0520.1802 05/20/2011
> ffffffff81e44ed6 ffff8801259ffa28 ffffffff818b20ff ffff8801259ffa68
> ffffffff81060a87 ffff8801259ffa58 ffff880205a921d0 ffff880233192740
> ffff880176496400 0000000000000000 ffffffffffffffe4 ffff8801259ffa78
> Call Trace:
> [<ffffffff818b20ff>] dump_stack+0x19/0x22
> [<ffffffff81060a87>] warn_slowpath_common+0x87/0xb0
> [<ffffffff81060aca>] warn_slowpath_null+0x1a/0x20
> [<ffffffff813131eb>] ext4_ext_direct_IO+0x2cb/0x3c0
> [<ffffffff813161c0>] ? ext4_get_block_write_nolock+0x20/0x20
> [<ffffffff813132e0>] ? ext4_ext_direct_IO+0x3c0/0x3c0
> [<ffffffff8131ac9f>] ext4_direct_IO+0x22f/0x3c0
> [<ffffffff8118db75>] generic_file_direct_write+0x175/0x240
> [<ffffffff81191bc6>] __generic_file_aio_write+0x556/0x770
> [<ffffffff8130d2de>] ext4_file_dio_write+0x35e/0x4f0
> [<ffffffff812929b3>] ? aio_rw_vect_retry+0xc3/0x250
> [<ffffffff8130d5ae>] ext4_file_write+0x13e/0x190
> [<ffffffff8130d470>] ? ext4_file_dio_write+0x4f0/0x4f0
> [<ffffffff812929e3>] aio_rw_vect_retry+0xf3/0x250
> [<ffffffff811c7f53>] ? might_fault+0x73/0xe0
> [<ffffffff8130d470>] ? ext4_file_dio_write+0x4f0/0x4f0
> [<ffffffff812949da>] aio_run_iocb+0x25a/0x3b0
> [<ffffffff81294e26>] io_submit_one+0x2f6/0x3a0
> [<ffffffff8129512e>] do_io_submit+0x25e/0x2f0
> [<ffffffff812926b2>] ? lookup_ioctx+0xc2/0x100
> [<ffffffff812951d0>] SyS_io_submit+0x10/0x20
> [<ffffffff818c4ac2>] system_call_fastpath+0x16/0x1b
> ---[ end trace c96126e84d56efc2 ]---
> ------------[ cut here ]------------
> WARNING: at fs/ext4/inode.c:3223 ext4_ext_direct_IO+0x2cb/0x3c0()
> Modules linked in: cpufreq_ondemand usb_storage acpi_cpufreq freq_table mperf coretemp kvm_intel kvm crc32c_intel ghash_clmulni_intel microcode sg xhci_hcd button sd_mod crc_t10dif aesni_intel ablk_helper cryptd lrw gf128mul glue_helper aes_x86_64 ahci libahci pata_acpi ata_generic dm_mirror dm_region_hash dm_log dm_mod
> CPU: 1 PID: 30539 Comm: fio Tainted: G W 3.9.0+ #14
> Hardware name: /DQ67SW, BIOS SWQ6710H.86A.0052.2011.0520.1802 05/20/2011
> ffffffff81e44ed6 ffff880131209a28 ffffffff818b20ff ffff880131209a68
> ffffffff81060a87 ffff880131209a58 ffff8801f5347990 ffff88022b9b7e00
> ffff88023342c508 0000000000000000 ffffffffffffffe4 ffff880131209a78
> Call Trace:
> [<ffffffff818b20ff>] dump_stack+0x19/0x22
> [<ffffffff81060a87>] warn_slowpath_common+0x87/0xb0
> [<ffffffff81060aca>] warn_slowpath_null+0x1a/0x20
> [<ffffffff813131eb>] ext4_ext_direct_IO+0x2cb/0x3c0
> [<ffffffff813161c0>] ? ext4_get_block_write_nolock+0x20/0x20
> [<ffffffff813132e0>] ? ext4_ext_direct_IO+0x3c0/0x3c0
> [<ffffffff8131ac9f>] ext4_direct_IO+0x22f/0x3c0
> [<ffffffff8118db75>] generic_file_direct_write+0x175/0x240
> [<ffffffff81191bc6>] __generic_file_aio_write+0x556/0x770
> [<ffffffff8130d2de>] ext4_file_dio_write+0x35e/0x4f0
> [<ffffffff812929b3>] ? aio_rw_vect_retry+0xc3/0x250
> [<ffffffff8130d5ae>] ext4_file_write+0x13e/0x190
> [<ffffffff8130d470>] ? ext4_file_dio_write+0x4f0/0x4f0
> [<ffffffff812929e3>] aio_rw_vect_retry+0xf3/0x250
> [<ffffffff811c7f53>] ? might_fault+0x73/0xe0
> [<ffffffff8130d470>] ? ext4_file_dio_write+0x4f0/0x4f0
> [<ffffffff812949da>] aio_run_iocb+0x25a/0x3b0
> [<ffffffff81294e26>] io_submit_one+0x2f6/0x3a0
> [<ffffffff8129512e>] do_io_submit+0x25e/0x2f0
> [<ffffffff812926b2>] ? lookup_ioctx+0xc2/0x100
> [<ffffffff812951d0>] SyS_io_submit+0x10/0x20
> [<ffffffff818c4ac2>] system_call_fastpath+0x16/0x1b
> ---[ end trace c96126e84d56efc3 ]---
> Slab corruption (Tainted: G W ): ext4_io_end start=ffff88023342c508, len=64
> Redzone: 0x9f911029d74e35b/0x9f911029d74e35b.
> Last user: [<ffffffff8131fe9b>](ext4_release_io_end+0x12b/0x130)
> 030: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6a 6b 6b a5 kkkkkkkkkkkkjkk.
> Prev obj: start=ffff88023342c4b0, len=64
> Redzone: 0xd84156c5635688c0/0xd84156c5635688c0.
> Last user: [<ffffffff813200d3>](ext4_init_io_end+0x23/0x60)
> 000: 58 cf 42 33 02 88 ff ff c8 27 33 62 01 88 ff ff X.B3.....'3b....
> 010: d0 51 a4 30 02 88 ff ff 05 00 00 00 00 00 00 00 .Q.0............
> Next obj: start=ffff88023342c560, len=64
> Redzone: 0xd84156c5635688c0/0xd84156c5635688c0.
> Last user: [<ffffffff813200d3>](ext4_init_io_end+0x23/0x60)
> 000: a8 d3 1e 27 02 88 ff ff a0 8d 47 2b 02 88 ff ff ...'......G+....
> 010: d0 51 a4 30 02 88 ff ff 05 00 00 00 00 00 00 00 .Q.0............
> Slab corruption (Tainted: G W ): ext4_io_end start=ffff880176496400, len=64
> Redzone: 0x9f911029d74e35b/0x9f911029d74e35b.
> Last user: [<ffffffff8131fe9b>](ext4_release_io_end+0x12b/0x130)
> 030: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6a 6b 6b a5 kkkkkkkkkkkkjkk.
> Prev obj: start=ffff8801764963a8, len=64
> Redzone: 0xd84156c5635688c0/0xd84156c5635688c0.
> Last user: [<ffffffff813200d3>](ext4_init_io_end+0x23/0x60)
> 000: a8 63 49 76 01 88 ff ff a8 63 49 76 01 88 ff ff .cIv.....cIv....
> 010: 90 b9 79 2b 02 88 ff ff 00 00 00 00 00 00 00 00 ..y+............
> Next obj: start=ffff880176496458, len=64
> Redzone: 0x9f911029d74e35b/0x9f911029d74e35b.
> Last user: [<ffffffff8131fe9b------------[ cut here ]------------
>
>
> > XFS TOO....
> > It it definitely just an ext3/4 related issue.
> > I've run xfstests on xfs and almost immediately have got slub corruption
> > I use following HEAD: 2dbd3cac87250a0d44e07acc86c4224a08522709
> >
> > 2013-05-11 11:59:30 Slab corruption (Not tainted): xfs_efi_item
> > start=ffff8802335063f0, len=400^M
> > 2013-05-11 11:59:30 Redzone: 0x9f911029d74e35b/0x9f911029d74e35b.^M
> > 2013-05-11 11:59:30 Last user:
> > [<ffffffffa03ba36f>](xfs_efi_item_free+0x3f/0x50 [xfs])^M
> > 2013-05-11 11:59:30 070: 6a 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
> > jkkkkkkkkkkkkkkk^M
> > 2013-05-11 11:59:30 Single bit error detected. Probably bad RAM.^M
> > 2013-05-11 11:59:30 Run memtest86+ or a similar memory test tool.^M
> > 2013-05-11 11:59:30 Prev obj: start=ffff880233506248, len=400^M
> > 2013-05-11 11:59:30 Redzone: 0xd84156c5635688c0/0xd84156c5635688c0.^M
> > 2013-05-11 11:59:30 Last user:
> > [<ffffffffa034d7bb>](kmem_zone_alloc+0xbb/0x190 [xfs])^M
> > 2013-05-11 11:59:30 000: 48 62 50 33 02 88 ff ff 48 62 50 33 02 88 ff ff
> > HbP3....HbP3....^M
> > 2013-05-11 11:59:30 010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > ................^M
> > 2013-05-11 11:59:30 Next obj: start=ffff880233506598, len=400^M
> > 2013-05-11 11:59:30 Redzone: 0x9f911029d74e35b/0x9f911029d74e35b.^M
> > >
> > > Thanks,
> > >
> > > - Ted
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> > > the body of a message to [email protected]
> > > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, May 11, 2013 at 03:00:53PM +0400, Dmitry Monakhov wrote:
> I've bisected ext4 related issue. It is appeared that it is pure ext4
> specific. Regression caused by following commit
> commit 4eec708d263f0ee10861d69251708a225b64cac7
> Author: Jan Kara <[email protected]>
> Date: Thu Apr 11 23:56:53 2013 -0400
> ext4: use io_end for multiple bios
Hmm... the next question is why did I see this in my testing. Did you
find this on ia64, or x86? Also what about the slab corruption which
you saw when running XFS; was that unrelated?
Since at this point it's safer to rollback the change and we can
investigate more deeply how to fix it correctly for the next
development cycle, this is the patch which I'm testing.
- Ted
>From dfd5057908a66b414d5d5fa2fc7ff7945e5a277c Mon Sep 17 00:00:00 2001
From: Theodore Ts'o <[email protected]>
Date: Sat, 11 May 2013 18:59:39 -0400
Subject: [PATCH] Revert "ext4: use io_end for multiple bios"
This reverts commit 4eec708d263f0ee10861d69251708a225b64cac7.
Multiple users have reported crashes which is apparently caused by
this commit. Thanks to Dmitry Monakhov for bisecting it.
Signed-off-by: "Theodore Ts'o" <[email protected]>
Cc: Dmitry Monakhov <[email protected]>
Cc: Jan Kara <[email protected]>
---
fs/ext4/ext4.h | 8 +---
fs/ext4/inode.c | 85 +++++++++++++++++---------------------
fs/ext4/page-io.c | 121 ++++++++++++++++++++----------------------------------
3 files changed, 85 insertions(+), 129 deletions(-)
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 0aabb34..5aae3d1 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -209,7 +209,6 @@ typedef struct ext4_io_end {
ssize_t size; /* size of the extent */
struct kiocb *iocb; /* iocb struct for AIO */
int result; /* error value for AIO */
- atomic_t count; /* reference counter */
} ext4_io_end_t;
struct ext4_io_submit {
@@ -2651,14 +2650,11 @@ extern int ext4_move_extents(struct file *o_filp, struct file *d_filp,
/* page-io.c */
extern int __init ext4_init_pageio(void);
+extern void ext4_add_complete_io(ext4_io_end_t *io_end);
extern void ext4_exit_pageio(void);
extern void ext4_ioend_shutdown(struct inode *);
+extern void ext4_free_io_end(ext4_io_end_t *io);
extern ext4_io_end_t *ext4_init_io_end(struct inode *inode, gfp_t flags);
-extern ext4_io_end_t *ext4_get_io_end(ext4_io_end_t *io_end);
-extern int ext4_put_io_end(ext4_io_end_t *io_end);
-extern void ext4_put_io_end_defer(ext4_io_end_t *io_end);
-extern void ext4_io_submit_init(struct ext4_io_submit *io,
- struct writeback_control *wbc);
extern void ext4_end_io_work(struct work_struct *work);
extern void ext4_io_submit(struct ext4_io_submit *io);
extern int ext4_bio_write_page(struct ext4_io_submit *io,
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 793d44b..d666569 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1487,10 +1487,7 @@ static int mpage_da_submit_io(struct mpage_da_data *mpd,
struct ext4_io_submit io_submit;
BUG_ON(mpd->next_page <= mpd->first_page);
- ext4_io_submit_init(&io_submit, mpd->wbc);
- io_submit.io_end = ext4_init_io_end(inode, GFP_NOFS);
- if (!io_submit.io_end)
- return -ENOMEM;
+ memset(&io_submit, 0, sizeof(io_submit));
/*
* We need to start from the first_page to the next_page - 1
* to make sure we also write the mapped dirty buffer_heads.
@@ -1578,8 +1575,6 @@ static int mpage_da_submit_io(struct mpage_da_data *mpd,
pagevec_release(&pvec);
}
ext4_io_submit(&io_submit);
- /* Drop io_end reference we got from init */
- ext4_put_io_end_defer(io_submit.io_end);
return ret;
}
@@ -2238,16 +2233,9 @@ static int ext4_writepage(struct page *page,
*/
return __ext4_journalled_writepage(page, len);
- ext4_io_submit_init(&io_submit, wbc);
- io_submit.io_end = ext4_init_io_end(inode, GFP_NOFS);
- if (!io_submit.io_end) {
- redirty_page_for_writepage(wbc, page);
- return -ENOMEM;
- }
+ memset(&io_submit, 0, sizeof(io_submit));
ret = ext4_bio_write_page(&io_submit, page, len, wbc);
ext4_io_submit(&io_submit);
- /* Drop io_end reference we got from init */
- ext4_put_io_end_defer(io_submit.io_end);
return ret;
}
@@ -3078,13 +3066,9 @@ static void ext4_end_io_dio(struct kiocb *iocb, loff_t offset,
struct inode *inode = file_inode(iocb->ki_filp);
ext4_io_end_t *io_end = iocb->private;
- /* if not async direct IO just return */
- if (!io_end) {
- inode_dio_done(inode);
- if (is_async)
- aio_complete(iocb, ret, 0);
- return;
- }
+ /* if not async direct IO or dio with 0 bytes write, just return */
+ if (!io_end || !size)
+ goto out;
ext_debug("ext4_end_io_dio(): io_end 0x%p "
"for inode %lu, iocb 0x%p, offset %llu, size %zd\n",
@@ -3092,13 +3076,25 @@ static void ext4_end_io_dio(struct kiocb *iocb, loff_t offset,
size);
iocb->private = NULL;
+
+ /* if not aio dio with unwritten extents, just free io and return */
+ if (!(io_end->flag & EXT4_IO_END_UNWRITTEN)) {
+ ext4_free_io_end(io_end);
+out:
+ inode_dio_done(inode);
+ if (is_async)
+ aio_complete(iocb, ret, 0);
+ return;
+ }
+
io_end->offset = offset;
io_end->size = size;
if (is_async) {
io_end->iocb = iocb;
io_end->result = ret;
}
- ext4_put_io_end_defer(io_end);
+
+ ext4_add_complete_io(io_end);
}
/*
@@ -3132,7 +3128,6 @@ static ssize_t ext4_ext_direct_IO(int rw, struct kiocb *iocb,
get_block_t *get_block_func = NULL;
int dio_flags = 0;
loff_t final_size = offset + count;
- ext4_io_end_t *io_end = NULL;
/* Use the old path for reads and writes beyond i_size. */
if (rw != WRITE || final_size > inode->i_size)
@@ -3171,16 +3166,13 @@ static ssize_t ext4_ext_direct_IO(int rw, struct kiocb *iocb,
iocb->private = NULL;
ext4_inode_aio_set(inode, NULL);
if (!is_sync_kiocb(iocb)) {
- io_end = ext4_init_io_end(inode, GFP_NOFS);
+ ext4_io_end_t *io_end = ext4_init_io_end(inode, GFP_NOFS);
if (!io_end) {
ret = -ENOMEM;
goto retake_lock;
}
io_end->flag |= EXT4_IO_END_DIRECT;
- /*
- * Grab reference for DIO. Will be dropped in ext4_end_io_dio()
- */
- iocb->private = ext4_get_io_end(io_end);
+ iocb->private = io_end;
/*
* we save the io structure for current async direct
* IO, so that later ext4_map_blocks() could flag the
@@ -3204,27 +3196,26 @@ static ssize_t ext4_ext_direct_IO(int rw, struct kiocb *iocb,
NULL,
dio_flags);
+ if (iocb->private)
+ ext4_inode_aio_set(inode, NULL);
/*
- * Put our reference to io_end. This can free the io_end structure e.g.
- * in sync IO case or in case of error. It can even perform extent
- * conversion if all bios we submitted finished before we got here.
- * Note that in that case iocb->private can be already set to NULL
- * here.
+ * The io_end structure takes a reference to the inode, that
+ * structure needs to be destroyed and the reference to the
+ * inode need to be dropped, when IO is complete, even with 0
+ * byte write, or failed.
+ *
+ * In the successful AIO DIO case, the io_end structure will
+ * be destroyed and the reference to the inode will be dropped
+ * after the end_io call back function is called.
+ *
+ * In the case there is 0 byte write, or error case, since VFS
+ * direct IO won't invoke the end_io call back function, we
+ * need to free the end_io structure here.
*/
- if (io_end) {
- ext4_inode_aio_set(inode, NULL);
- ext4_put_io_end(io_end);
- /*
- * In case of error or no write ext4_end_io_dio() was not
- * called so we have to put iocb's reference.
- */
- if (ret <= 0 && ret != -EIOCBQUEUED) {
- WARN_ON(iocb->private != io_end);
- ext4_put_io_end(io_end);
- iocb->private = NULL;
- }
- }
- if (ret > 0 && !overwrite && ext4_test_inode_state(inode,
+ if (ret != -EIOCBQUEUED && ret <= 0 && iocb->private) {
+ ext4_free_io_end(iocb->private);
+ iocb->private = NULL;
+ } else if (ret > 0 && !overwrite && ext4_test_inode_state(inode,
EXT4_STATE_DIO_UNWRITTEN)) {
int err;
/*
diff --git a/fs/ext4/page-io.c b/fs/ext4/page-io.c
index 5929cd0..6626aba 100644
--- a/fs/ext4/page-io.c
+++ b/fs/ext4/page-io.c
@@ -61,28 +61,15 @@ void ext4_ioend_shutdown(struct inode *inode)
cancel_work_sync(&EXT4_I(inode)->i_unwritten_work);
}
-static void ext4_release_io_end(ext4_io_end_t *io_end)
+void ext4_free_io_end(ext4_io_end_t *io)
{
- BUG_ON(!list_empty(&io_end->list));
- BUG_ON(io_end->flag & EXT4_IO_END_UNWRITTEN);
-
- if (atomic_dec_and_test(&EXT4_I(io_end->inode)->i_ioend_count))
- wake_up_all(ext4_ioend_wq(io_end->inode));
- if (io_end->flag & EXT4_IO_END_DIRECT)
- inode_dio_done(io_end->inode);
- if (io_end->iocb)
- aio_complete(io_end->iocb, io_end->result, 0);
- kmem_cache_free(io_end_cachep, io_end);
-}
-
-static void ext4_clear_io_unwritten_flag(ext4_io_end_t *io_end)
-{
- struct inode *inode = io_end->inode;
+ BUG_ON(!io);
+ BUG_ON(!list_empty(&io->list));
+ BUG_ON(io->flag & EXT4_IO_END_UNWRITTEN);
- io_end->flag &= ~EXT4_IO_END_UNWRITTEN;
- /* Wake up anyone waiting on unwritten extent conversion */
- if (atomic_dec_and_test(&EXT4_I(inode)->i_unwritten))
- wake_up_all(ext4_ioend_wq(inode));
+ if (atomic_dec_and_test(&EXT4_I(io->inode)->i_ioend_count))
+ wake_up_all(ext4_ioend_wq(io->inode));
+ kmem_cache_free(io_end_cachep, io);
}
/* check a range of space and convert unwritten extents to written. */
@@ -105,8 +92,13 @@ static int ext4_end_io(ext4_io_end_t *io)
"(inode %lu, offset %llu, size %zd, error %d)",
inode->i_ino, offset, size, ret);
}
- ext4_clear_io_unwritten_flag(io);
- ext4_release_io_end(io);
+ /* Wake up anyone waiting on unwritten extent conversion */
+ if (atomic_dec_and_test(&EXT4_I(inode)->i_unwritten))
+ wake_up_all(ext4_ioend_wq(inode));
+ if (io->flag & EXT4_IO_END_DIRECT)
+ inode_dio_done(inode);
+ if (io->iocb)
+ aio_complete(io->iocb, io->result, 0);
return ret;
}
@@ -137,7 +129,7 @@ static void dump_completed_IO(struct inode *inode)
}
/* Add the io_end to per-inode completed end_io list. */
-static void ext4_add_complete_io(ext4_io_end_t *io_end)
+void ext4_add_complete_io(ext4_io_end_t *io_end)
{
struct ext4_inode_info *ei = EXT4_I(io_end->inode);
struct workqueue_struct *wq;
@@ -174,6 +166,8 @@ static int ext4_do_flush_completed_IO(struct inode *inode)
err = ext4_end_io(io);
if (unlikely(!ret && err))
ret = err;
+ io->flag &= ~EXT4_IO_END_UNWRITTEN;
+ ext4_free_io_end(io);
}
return ret;
}
@@ -205,43 +199,10 @@ ext4_io_end_t *ext4_init_io_end(struct inode *inode, gfp_t flags)
atomic_inc(&EXT4_I(inode)->i_ioend_count);
io->inode = inode;
INIT_LIST_HEAD(&io->list);
- atomic_set(&io->count, 1);
}
return io;
}
-void ext4_put_io_end_defer(ext4_io_end_t *io_end)
-{
- if (atomic_dec_and_test(&io_end->count)) {
- if (!(io_end->flag & EXT4_IO_END_UNWRITTEN) || !io_end->size) {
- ext4_release_io_end(io_end);
- return;
- }
- ext4_add_complete_io(io_end);
- }
-}
-
-int ext4_put_io_end(ext4_io_end_t *io_end)
-{
- int err = 0;
-
- if (atomic_dec_and_test(&io_end->count)) {
- if (io_end->flag & EXT4_IO_END_UNWRITTEN) {
- err = ext4_convert_unwritten_extents(io_end->inode,
- io_end->offset, io_end->size);
- ext4_clear_io_unwritten_flag(io_end);
- }
- ext4_release_io_end(io_end);
- }
- return err;
-}
-
-ext4_io_end_t *ext4_get_io_end(ext4_io_end_t *io_end)
-{
- atomic_inc(&io_end->count);
- return io_end;
-}
-
/*
* Print an buffer I/O error compatible with the fs/buffer.c. This
* provides compatibility with dmesg scrapers that look for a specific
@@ -324,7 +285,12 @@ static void ext4_end_bio(struct bio *bio, int error)
bi_sector >> (inode->i_blkbits - 9));
}
- ext4_put_io_end_defer(io_end);
+ if (!(io_end->flag & EXT4_IO_END_UNWRITTEN)) {
+ ext4_free_io_end(io_end);
+ return;
+ }
+
+ ext4_add_complete_io(io_end);
}
void ext4_io_submit(struct ext4_io_submit *io)
@@ -338,37 +304,40 @@ void ext4_io_submit(struct ext4_io_submit *io)
bio_put(io->io_bio);
}
io->io_bio = NULL;
-}
-
-void ext4_io_submit_init(struct ext4_io_submit *io,
- struct writeback_control *wbc)
-{
- io->io_op = (wbc->sync_mode == WB_SYNC_ALL ? WRITE_SYNC : WRITE);
- io->io_bio = NULL;
+ io->io_op = 0;
io->io_end = NULL;
}
-static int io_submit_init_bio(struct ext4_io_submit *io,
- struct buffer_head *bh)
+static int io_submit_init(struct ext4_io_submit *io,
+ struct inode *inode,
+ struct writeback_control *wbc,
+ struct buffer_head *bh)
{
+ ext4_io_end_t *io_end;
+ struct page *page = bh->b_page;
int nvecs = bio_get_nr_vecs(bh->b_bdev);
struct bio *bio;
+ io_end = ext4_init_io_end(inode, GFP_NOFS);
+ if (!io_end)
+ return -ENOMEM;
bio = bio_alloc(GFP_NOIO, min(nvecs, BIO_MAX_PAGES));
bio->bi_sector = bh->b_blocknr * (bh->b_size >> 9);
bio->bi_bdev = bh->b_bdev;
+ bio->bi_private = io->io_end = io_end;
bio->bi_end_io = ext4_end_bio;
- bio->bi_private = ext4_get_io_end(io->io_end);
- if (!io->io_end->size)
- io->io_end->offset = (bh->b_page->index << PAGE_CACHE_SHIFT)
- + bh_offset(bh);
+
+ io_end->offset = (page->index << PAGE_CACHE_SHIFT) + bh_offset(bh);
+
io->io_bio = bio;
+ io->io_op = (wbc->sync_mode == WB_SYNC_ALL ? WRITE_SYNC : WRITE);
io->io_next_block = bh->b_blocknr;
return 0;
}
static int io_submit_add_bh(struct ext4_io_submit *io,
struct inode *inode,
+ struct writeback_control *wbc,
struct buffer_head *bh)
{
ext4_io_end_t *io_end;
@@ -379,18 +348,18 @@ submit_and_retry:
ext4_io_submit(io);
}
if (io->io_bio == NULL) {
- ret = io_submit_init_bio(io, bh);
+ ret = io_submit_init(io, inode, wbc, bh);
if (ret)
return ret;
}
- ret = bio_add_page(io->io_bio, bh->b_page, bh->b_size, bh_offset(bh));
- if (ret != bh->b_size)
- goto submit_and_retry;
io_end = io->io_end;
if (test_clear_buffer_uninit(bh))
ext4_set_io_unwritten_flag(inode, io_end);
- io_end->size += bh->b_size;
+ io->io_end->size += bh->b_size;
io->io_next_block++;
+ ret = bio_add_page(io->io_bio, bh->b_page, bh->b_size, bh_offset(bh));
+ if (ret != bh->b_size)
+ goto submit_and_retry;
return 0;
}
@@ -462,7 +431,7 @@ int ext4_bio_write_page(struct ext4_io_submit *io,
do {
if (!buffer_async_write(bh))
continue;
- ret = io_submit_add_bh(io, inode, bh);
+ ret = io_submit_add_bh(io, inode, wbc, bh);
if (ret) {
/*
* We only get here on ENOMEM. Not much else
--
1.7.12.rc0.22.gcdd159b
> Since at this point it's safer to rollback the change and we can
> investigate more deeply how to fix it correctly for the next
> development cycle, this is the patch which I'm testing.
> - Ted
Hello, I've tested with your patch. But the same problem was reproduced.
Currently, I'm trying to git bisect. If i done git bisect, i will let you know.
Thanks.
>> Since at this point it's safer to rollback the change and we can
>> investigate more deeply how to fix it correctly for the next
>> development cycle, this is the patch which I'm testing.
>> - Ted
> Hello, I've tested with your patch. But the same problem was reproduced.
> Currently, I'm trying to git bisect. If i done git bisect, i will let you know.
Hi, I've done git bisect. and panic at jbd2_journal_put_journal_head() is caused by
ae4647fb7654676fc44a97e86eb35f9f06b99f66: "jbd2: reduce journal_head size."
I write just code patch which revert ae4647fb7654676fc44a97e86eb35f9f06b99f66 because
I don't know the root cause.
Signed-off-by: Eunbong Song <[email protected]>
---
include/linux/journal-head.h | 11 +++++++++--
1 files changed, 9 insertions(+), 2 deletions(-)
diff --git a/include/linux/journal-head.h b/include/linux/journal-head.h
index 13a3da2..c18b46f 100644
--- a/include/linux/journal-head.h
+++ b/include/linux/journal-head.h
@@ -31,14 +31,21 @@ struct journal_head {
/*
* Journalling list for this buffer [jbd_lock_bh_state()]
*/
- unsigned b_jlist:4;
+ unsigned b_jlist;
/*
* This flag signals the buffer has been modified by
* the currently running transaction
* [jbd_lock_bh_state()]
*/
- unsigned b_modified:1;
+ unsigned b_modified;
+
+ /*
+ * This feild tracks the last transaction id in which this buffer
+ * has been cowed
+ * [jbd_lock_bh_state()]
+ */
+ tid_t b_cow_tid;
/*
* Copy of the buffer data frozen for writing to the log.
--
1.7.0.4
Thanks.
On Sun, 12 May 2013 13:05:00 +0000 (GMT), EUNBONG SONG <[email protected]> wrote:
>
>
> >> Since at this point it's safer to rollback the change and we can
> >> investigate more deeply how to fix it correctly for the next
> >> development cycle, this is the patch which I'm testing.
>
> >> - Ted
>
> > Hello, I've tested with your patch. But the same problem was reproduced.
> > Currently, I'm trying to git bisect. If i done git bisect, i will let you know.
>
> Hi, I've done git bisect. and panic at jbd2_journal_put_journal_head() is caused by
> ae4647fb7654676fc44a97e86eb35f9f06b99f66: "jbd2: reduce journal_head size."
OK that explains both regressions ext4/jbd2 and ext3/jbd(https://lkml.org/lkml/2013/5/10/434)
Probably some race because b_jlist accessed w/o lock some places.
BUT i still can not reproduce original issue :(
I've tried all types of disks near me (ssd,hdd,usb-dongle),
What disk type you use? Is it reproducible on x86?
Please post following info:
#fdisk -l $HDD
#hdparm -t $HDD
#df -h
#cat /proc/mounts
> I write just code patch which revert ae4647fb7654676fc44a97e86eb35f9f06b99f66 because
> I don't know the root cause.
>
>
> Signed-off-by: Eunbong Song <[email protected]>
> ---
> include/linux/journal-head.h | 11 +++++++++--
> 1 files changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/journal-head.h b/include/linux/journal-head.h
> index 13a3da2..c18b46f 100644
> --- a/include/linux/journal-head.h
> +++ b/include/linux/journal-head.h
> @@ -31,14 +31,21 @@ struct journal_head {
> /*
> * Journalling list for this buffer [jbd_lock_bh_state()]
> */
> - unsigned b_jlist:4;
> + unsigned b_jlist;
>
> /*
> * This flag signals the buffer has been modified by
> * the currently running transaction
> * [jbd_lock_bh_state()]
> */
> - unsigned b_modified:1;
> + unsigned b_modified;
> +
> + /*
> + * This feild tracks the last transaction id in which this buffer
> + * has been cowed
> + * [jbd_lock_bh_state()]
> + */
> + tid_t b_cow_tid;
>
> /*
> * Copy of the buffer data frozen for writing to the log.
> --
> 1.7.0.4
>
>
> Thanks. N㎠咽ry鉉bXФ푤v^)頻{.n+램{喩zX㎍●}꼿z&j:+v㉭喩zZ++zf"h~i鎬zwⅱ?②&)刪f^j푹ym
@Aa뛴0띠hi
On Sat, May 11, 2013 at 12:52 AM, Dmitry Monakhov <[email protected]> wrote:.
> What was page_size and fsblock size?
CONFIG_IA64_PAGE_SIZE_64KB=y
fsblock size is whatever is the default for SLES11SP2 on ia64 - which
tool will tell me?
My git bisect finally competed and points the a finger at:
bisect> git bisect good
ae4647fb7654676fc44a97e86eb35f9f06b99f66 is first bad commit
commit ae4647fb7654676fc44a97e86eb35f9f06b99f66
Author: Jan Kara <[email protected]>
Date: Fri Apr 12 00:03:42 2013 -0400
jbd2: reduce journal_head size
Remove unused t_cow_tid field (ext4 copy-on-write support doesn't seem
to be happening) and change b_modified and b_jlist to bitfields thus
saving 8 bytes in the structure.
Signed-off-by: Jan Kara <[email protected]>
Signed-off-by: "Theodore Ts'o" <[email protected]>
Reviewed-by: Zheng Liu <[email protected]>
:040000 040000 c39ece4341894b3daf84764ba425a87ffb90fe50
d4e8d9185c2a1b740c235ca8ed05d496a442fce3 M include
-Tony
> CONFIG_IA64_PAGE_SIZE_64KB=y
> fsblock size is whatever is the default for SLES11SP2 on ia64 - which
> tool will tell me?
> My git bisect finally competed and points the a finger at:
> bisect> git bisect good
> ae4647fb7654676fc44a97e86eb35f9f06b99f66 is first bad commit
> commit ae4647fb7654676fc44a97e86eb35f9f06b99f66
> Author: Jan Kara
> Date: Fri Apr 12 00:03:42 2013 -0400
> jbd2: reduce journal_head size
> Remove unused t_cow_tid field (ext4 copy-on-write support doesn't seem
> to be happening) and change b_modified and b_jlist to bitfields thus
> saving 8 bytes in the structure.
> Signed-off-by: Jan Kara
> Signed-off-by: "Theodore Ts'o"
> Reviewed-by: Zheng Liu
> :040000 040000 c39ece4341894b3daf84764ba425a87ffb90fe50
> d4e8d9185c2a1b740c235ca8ed05d496a442fce3 M include
Hi, my git bisect result is same yours. And i reported that to community yesterday.
Thanks.
On Sun, May 12, 2013 at 07:04:45PM -0700, Tony Luck wrote:
> My git bisect finally competed and points the a finger at:
>
> commit ae4647fb7654676fc44a97e86eb35f9f06b99f66
> Author: Jan Kara <[email protected]>
> Date: Fri Apr 12 00:03:42 2013 -0400
>
> jbd2: reduce journal_head size
>
> Remove unused t_cow_tid field (ext4 copy-on-write support doesn't seem
> to be happening) and change b_modified and b_jlist to bitfields thus
> saving 8 bytes in the structure.
Both you and Eunbong Song bisected to the same commit, so presumably
the right thing to do at this point is to revert it. Have you tried
reverting the commit and demonstrating that the problem goes away
afterwards?
The reason why I ask is that I'm completely at a lost to understand
why this commit could be making a difference. Loooking at the commit,
we're converting two unsigned fields, neither of which use more than 4
bits or 1 bits, respectively, to use bitfields instead. Why this
could be causing __journal_remove_journal_head() to fail, especially
in the way that it does, isn't making any sense to me. We are
technically accessing jh->b_jlist without first locking
jbd2_lock_bh_state(), but (a) it shouldn't make a difference whether
we use a bitfield or 32-bit unsigned value, and (b) by the time we get
to __journal_remove_journal_head(), nothing should be using the
journal head, and we've locked jbd_lock_bh_journal_head(), which
should prevent any one else from starting to use the journal head.
Applying patch where I don't understand how it would make things
better, even if it is a revert, scares me. If we are going to do
this, and since I haven't yet been able to reproduce it on my testing
setup, could you try taking Linus's just released 3.10-rc1 release,
and revert commit ae4647fb765467, and confirm that this avoids the
crash which you are seeing?
Thanks,
- Ted
On Sun, May 12, 2013 at 7:21 PM, EUNBONG SONG <[email protected]> wrote:
> Hi, my git bisect result is same yours. And i reported that to community yesterday.
Ah. Good to have some confirmation (I was never sure how long to keep
running before deciding that a test was "good". My slowest "bad" test took
about 2.5 hours. I mostly let the tests run for >6 hours before deciding.
I just confirmed that 3.10-rc1 still fails (30 minutes). Now running a test
on 3.10-rc1 with just this commit reverted. Only been going for about
15 minutes, so no useful information yet.
My best guess as to why this commit causes problems is that there are places
where updates to individual fields in this structure used to be independent
because they were to whole words. Now we have bitfileds there are races
between access to different fields in the same word.
-Tony
On Sun, May 12, 2013 at 08:11:59PM -0700, Tony Luck wrote:
>
> My best guess as to why this commit causes problems is that there are places
> where updates to individual fields in this structure used to be independent
> because they were to whole words. Now we have bitfileds there are races
> between access to different fields in the same word.
Yeah, except we access the fields while holding a lock.... wait a
minute. We're using bit_spinlocks().... and am I missing something?
Where are the barrier statements to prevent the CPU or the compiler
from reordering statements around bit_spin_lock()? But if that's the
problem, I would have expected lots of other things to be broken.
- Ted
> Hi,
> Bitfields are likely to be implemented using read-modify-write semantics.
> Modifications of either b_jlist or b_jmodified must be done under lock
> since they share same uint. I guess this lock is missing somewhere.
Hi, I agree with you. b_jlist and b_jmodified share the same unit.
I think they are separated.
Thanks.
On Sun, 2013-05-12 at 23:36 -0400, Theodore Ts'o wrote:
> On Sun, May 12, 2013 at 08:11:59PM -0700, Tony Luck wrote:
> >
> > My best guess as to why this commit causes problems is that there are places
> > where updates to individual fields in this structure used to be independent
> > because they were to whole words. Now we have bitfileds there are races
> > between access to different fields in the same word.
>
> Yeah, except we access the fields while holding a lock.... wait a
> minute. We're using bit_spinlocks().... and am I missing something?
>
> Where are the barrier statements to prevent the CPU or the compiler
> from reordering statements around bit_spin_lock()? But if that's the
> problem, I would have expected lots of other things to be broken.
Those use test_and_set_bit(), which per Paul McMemory-Wizard...
ATOMIC OPERATIONS
-----------------
Whilst they are technically interprocessor interaction considerations, atomic
operations are noted specially as some of them imply full memory barriers and
some don't, but they're very heavily relied on as a group throughout the
kernel.
Any atomic operation that modifies some state in memory and returns information
about the state (old or new) implies an SMP-conditional general memory barrier
(smp_mb()) on each side of the actual operation (with the exception of
explicit lock operations, described later). These include:
xchg();
cmpxchg();
atomic_xchg();
atomic_cmpxchg();
atomic_inc_return();
atomic_dec_return();
atomic_add_return();
atomic_sub_return();
atomic_inc_and_test();
atomic_dec_and_test();
atomic_sub_and_test();
atomic_add_negative();
atomic_add_unless(); /* when succeeds (returns 1) */
test_and_set_bit();
test_and_clear_bit();
test_and_change_bit();
Hi,
Bitfields are likely to be implemented using read-modify-write semantics.
Modifications of either b_jlist or b_jmodified must be done under lock
since they share same uint. I guess this lock is missing somewhere.
Regards,
Andrei.
On 12.05.2013 20:07, Theodore Ts'o wrote:
> On Sun, May 12, 2013 at 07:04:45PM -0700, Tony Luck wrote:
>> My git bisect finally competed and points the a finger at:
>>
>> commit ae4647fb7654676fc44a97e86eb35f9f06b99f66
>> Author: Jan Kara <[email protected]>
>> Date: Fri Apr 12 00:03:42 2013 -0400
>>
>> jbd2: reduce journal_head size
>>
>> Remove unused t_cow_tid field (ext4 copy-on-write support doesn't seem
>> to be happening) and change b_modified and b_jlist to bitfields thus
>> saving 8 bytes in the structure.
> Both you and Eunbong Song bisected to the same commit, so presumably
> the right thing to do at this point is to revert it. Have you tried
> reverting the commit and demonstrating that the problem goes away
> afterwards?
>
> The reason why I ask is that I'm completely at a lost to understand
> why this commit could be making a difference. Loooking at the commit,
> we're converting two unsigned fields, neither of which use more than 4
> bits or 1 bits, respectively, to use bitfields instead. Why this
> could be causing __journal_remove_journal_head() to fail, especially
> in the way that it does, isn't making any sense to me. We are
> technically accessing jh->b_jlist without first locking
> jbd2_lock_bh_state(), but (a) it shouldn't make a difference whether
> we use a bitfield or 32-bit unsigned value, and (b) by the time we get
> to __journal_remove_journal_head(), nothing should be using the
> journal head, and we've locked jbd_lock_bh_journal_head(), which
> should prevent any one else from starting to use the journal head.
>
> Applying patch where I don't understand how it would make things
> better, even if it is a revert, scares me. If we are going to do
> this, and since I haven't yet been able to reproduce it on my testing
> setup, could you try taking Linus's just released 3.10-rc1 release,
> and revert commit ae4647fb765467, and confirm that this avoids the
> crash which you are seeing?
>
> Thanks,
>
> - Ted
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
The 3.10-rc1 with ae4647fb765467 reverted is still running OK. At
3 hours now (only marginally longer that the 2.5 hours that one of
the "bad" runs during the bisect managed). So I'm about 30% sure
that we have a winner at the moment. I'll leave it running and check
again in the morning. This penguin is heading to bed now.
-Tony
Hi,
I have some problem to boot with 3.10-rc1. So i will test with e0fd9affeb64088eff407dfc98bbd3a5c17ea479.
The commit message is as follow.
commit e0fd9affeb64088eff407dfc98bbd3a5c17ea479
Merge: 3d15b79 ea9627c
Author: Linus Torvalds <[email protected]>
Date: Wed May 8 15:29:48 2013 -0700
Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
Usually, the problem was reproduced in 30 seconds in my board.
I will test only revert ae4647fb7654676fc44a97e86eb35f9f06b99f66(jbd2: reduce journal_head size) and
i will let you know the test result.
Thanks.
On Sun, May 12, 2013 at 07:04:45PM -0700, Tony Luck wrote:
> On Sat, May 11, 2013 at 12:52 AM, Dmitry Monakhov <[email protected]> wrote:.
> > What was page_size and fsblock size?
>
> CONFIG_IA64_PAGE_SIZE_64KB=y
>
> fsblock size is whatever is the default for SLES11SP2 on ia64 - which
> tool will tell me?
>
> My git bisect finally competed and points the a finger at:
>
> bisect> git bisect good
> ae4647fb7654676fc44a97e86eb35f9f06b99f66 is first bad commit
> commit ae4647fb7654676fc44a97e86eb35f9f06b99f66
> Author: Jan Kara <[email protected]>
> Date: Fri Apr 12 00:03:42 2013 -0400
>
> jbd2: reduce journal_head size
>
> Remove unused t_cow_tid field (ext4 copy-on-write support doesn't seem
> to be happening) and change b_modified and b_jlist to bitfields thus
> saving 8 bytes in the structure.
>
> Signed-off-by: Jan Kara <[email protected]>
> Signed-off-by: "Theodore Ts'o" <[email protected]>
> Reviewed-by: Zheng Liu <[email protected]>
>
> :040000 040000 c39ece4341894b3daf84764ba425a87ffb90fe50
> d4e8d9185c2a1b740c235ca8ed05d496a442fce3 M include
Hi all,
First of all I couldn't reproduce this regression in my sand box. So
the following speculation is only my guess. I suspect that the commit
(ae4647fb) isn't root cause. It just uncover a potential bug that has
been there for a long time. I look at the code, and found two
suspicious stuff in jbd2. The first one is in do_get_write_access().
In this function we forgot to lock bh state when we check b_jlist ==
BJ_Shadow. I generate a patch to fix it, and I really think it is the
root cause. Further, in __journal_remove_journal_head() we check
b_jlist == BJ_None. But, when this function is called, bh state won't
be locked sometimes. So I suspect this is why we hit a BUG in
jbd2_journal_put_journal_head(). But I don't have a good solution to
fix this until now because I don't know whether we need to lock bh state
here, or maybe we should remove this assertation.
So, generally, Tony, Eunbong, could you please try the following patch?
Thanks in advance,
- Zheng
Subject: [PATCH] jbd2: lock bh state when check b_jlist == BJ_Shadow
From: Zheng Liu <[email protected]>
When we try to check b_jlist's value we need to lock bh state. But in
do_get_write_access when we check b_jlist == BJ_Shadow we won't lock bh
state. So fix it.
Signed-off-by: Zheng Liu <[email protected]>
---
fs/jbd2/transaction.c | 14 ++++++++------
1 file changed, 8 insertions(+), 6 deletions(-)
diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index 10f524c..a800513 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -761,16 +761,18 @@ repeat:
wqh = bit_waitqueue(&bh->b_state, BH_Unshadow);
JBUFFER_TRACE(jh, "on shadow: sleep");
- jbd_unlock_bh_state(bh);
/* commit wakes up all shadow buffers after IO */
- for ( ; ; ) {
- prepare_to_wait(wqh, &wait.wait,
- TASK_UNINTERRUPTIBLE);
+ do {
if (jh->b_jlist != BJ_Shadow)
break;
+ prepare_to_wait(wqh, &wait.wait,
+ TASK_UNINTERRUPTIBLE);
+ jbd_unlock_bh_state(bh);
schedule();
- }
- finish_wait(wqh, &wait.wait);
+ finish_wait(wqh, &wait.wait);
+ jbd_lock_bh_state(bh);
+ } while (1);
+ jbd_unlock_bh_state(bh);
goto repeat;
}
--
1.7.12.rc2.18.g61b472e
> Hi all,
> First of all I couldn't reproduce this regression in my sand box. So
> the following speculation is only my guess. I suspect that the commit
> (ae4647fb) isn't root cause. It just uncover a potential bug that has
> been there for a long time. I look at the code, and found two
> suspicious stuff in jbd2. The first one is in do_get_write_access().
> In this function we forgot to lock bh state when we check b_jlist ==
> BJ_Shadow. I generate a patch to fix it, and I really think it is the
> root cause. Further, in __journal_remove_journal_head() we check
> b_jlist == BJ_None. But, when this function is called, bh state won't
> be locked sometimes. So I suspect this is why we hit a BUG in
> jbd2_journal_put_journal_head(). But I don't have a good solution to
> fix this until now because I don't know whether we need to lock bh state
> here, or maybe we should remove this assertation.
>
> So, generally, Tony, Eunbong, could you please try the following patch?
>
> Thanks in advance,
> - Zheng
Hi, I tested your patch. Unfortunately, the same problem was reproduced.
Thanks.
On Mon, May 13, 2013 at 09:53:25AM +0000, EUNBONG SONG wrote:
>
>
> > Hi all,
>
> > First of all I couldn't reproduce this regression in my sand box. So
> > the following speculation is only my guess. I suspect that the commit
> > (ae4647fb) isn't root cause. It just uncover a potential bug that has
> > been there for a long time. I look at the code, and found two
> > suspicious stuff in jbd2. The first one is in do_get_write_access().
> > In this function we forgot to lock bh state when we check b_jlist ==
> > BJ_Shadow. I generate a patch to fix it, and I really think it is the
> > root cause. Further, in __journal_remove_journal_head() we check
> > b_jlist == BJ_None. But, when this function is called, bh state won't
> > be locked sometimes. So I suspect this is why we hit a BUG in
> > jbd2_journal_put_journal_head(). But I don't have a good solution to
> > fix this until now because I don't know whether we need to lock bh state
> > here, or maybe we should remove this assertation.
> >
> > So, generally, Tony, Eunbong, could you please try the following patch?
> >
> > Thanks in advance,
> > - Zheng
>
>
> Hi, I tested your patch. Unfortunately, the same problem was reproduced.
> Thanks.
Thanks for trying this patch. Could you please repost the dmesg log for
me? I want to make sure whether the second suspicious stuff causes this
regression or not. Further, that would be great if you could try to
comment this line as the following?
diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
index 886ec2f..a9e3779 100644
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -2453,7 +2453,7 @@ static void __journal_remove_journal_head(struct
buffer_head *bh)
J_ASSERT_JH(jh, jh->b_transaction == NULL);
J_ASSERT_JH(jh, jh->b_next_transaction == NULL);
J_ASSERT_JH(jh, jh->b_cp_transaction == NULL);
- J_ASSERT_JH(jh, jh->b_jlist == BJ_None);
+ /*J_ASSERT_JH(jh, jh->b_jlist == BJ_None);*/
J_ASSERT_BH(bh, buffer_jbd(bh));
J_ASSERT_BH(bh, jh2bh(jh) == bh);
BUFFER_TRACE(bh, "remove journal_head");
Really thanks,
- Zheng
On Mon, 13 May 2013 19:26:34 +0800, Zheng Liu <[email protected]> wrote:
> On Mon, May 13, 2013 at 09:53:25AM +0000, EUNBONG SONG wrote:
> >
> >
> > > Hi all,
> >
> > > First of all I couldn't reproduce this regression in my sand box. So
> > > the following speculation is only my guess. I suspect that the commit
> > > (ae4647fb) isn't root cause. It just uncover a potential bug that has
> > > been there for a long time. I look at the code, and found two
> > > suspicious stuff in jbd2. The first one is in do_get_write_access().
> > > In this function we forgot to lock bh state when we check b_jlist ==
> > > BJ_Shadow. I generate a patch to fix it, and I really think it is the
> > > root cause. Further, in __journal_remove_journal_head() we check
> > > b_jlist == BJ_None. But, when this function is called, bh state won't
> > > be locked sometimes. So I suspect this is why we hit a BUG in
> > > jbd2_journal_put_journal_head(). But I don't have a good solution to
> > > fix this until now because I don't know whether we need to lock bh state
> > > here, or maybe we should remove this assertation.
> > >
> > > So, generally, Tony, Eunbong, could you please try the following patch?
> > >
> > > Thanks in advance,
> > > - Zheng
> >
> >
> > Hi, I tested your patch. Unfortunately, the same problem was reproduced.
> > Thanks.
>
> Thanks for trying this patch. Could you please repost the dmesg log for
> me? I want to make sure whether the second suspicious stuff causes this
> regression or not. Further, that would be great if you could try to
> comment this line as the following?
AFAIK following assertion was triggered jh->b_transaction != NULL
>
> diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
> index 886ec2f..a9e3779 100644
> --- a/fs/jbd2/journal.c
> +++ b/fs/jbd2/journal.c
> @@ -2453,7 +2453,7 @@ static void __journal_remove_journal_head(struct
> buffer_head *bh)
> J_ASSERT_JH(jh, jh->b_transaction == NULL);
> J_ASSERT_JH(jh, jh->b_next_transaction == NULL);
> J_ASSERT_JH(jh, jh->b_cp_transaction == NULL);
> - J_ASSERT_JH(jh, jh->b_jlist == BJ_None);
> + /*J_ASSERT_JH(jh, jh->b_jlist == BJ_None);*/
> J_ASSERT_BH(bh, buffer_jbd(bh));
> J_ASSERT_BH(bh, jh2bh(jh) == bh);
> BUFFER_TRACE(bh, "remove journal_head");
>
> Really thanks,
> - Zheng
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi, I just wonder. Is there no problem with endianess.
I mean usually bit field is defined with __BIG_ENDIAN_BITFIELD or
__LITTLE_ENDIAN_BITFIELD. But b_jlist and b_modfied is defined with no
pad.
It seems to be good but i just want to make sure.
Thanks.
2013/5/13 Dmitry Monakhov <[email protected]>:
> On Mon, 13 May 2013 19:26:34 +0800, Zheng Liu <[email protected]> wrote:
>> On Mon, May 13, 2013 at 09:53:25AM +0000, EUNBONG SONG wrote:
>> >
>> >
>> > > Hi all,
>> >
>> > > First of all I couldn't reproduce this regression in my sand box. So
>> > > the following speculation is only my guess. I suspect that the commit
>> > > (ae4647fb) isn't root cause. It just uncover a potential bug that has
>> > > been there for a long time. I look at the code, and found two
>> > > suspicious stuff in jbd2. The first one is in do_get_write_access().
>> > > In this function we forgot to lock bh state when we check b_jlist ==
>> > > BJ_Shadow. I generate a patch to fix it, and I really think it is the
>> > > root cause. Further, in __journal_remove_journal_head() we check
>> > > b_jlist == BJ_None. But, when this function is called, bh state won't
>> > > be locked sometimes. So I suspect this is why we hit a BUG in
>> > > jbd2_journal_put_journal_head(). But I don't have a good solution to
>> > > fix this until now because I don't know whether we need to lock bh state
>> > > here, or maybe we should remove this assertation.
>> > >
>> > > So, generally, Tony, Eunbong, could you please try the following patch?
>> > >
>> > > Thanks in advance,
>> > > - Zheng
>> >
>> >
>> > Hi, I tested your patch. Unfortunately, the same problem was reproduced.
>> > Thanks.
>>
>> Thanks for trying this patch. Could you please repost the dmesg log for
>> me? I want to make sure whether the second suspicious stuff causes this
>> regression or not. Further, that would be great if you could try to
>> comment this line as the following?
> AFAIK following assertion was triggered jh->b_transaction != NULL
>>
>> diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
>> index 886ec2f..a9e3779 100644
>> --- a/fs/jbd2/journal.c
>> +++ b/fs/jbd2/journal.c
>> @@ -2453,7 +2453,7 @@ static void __journal_remove_journal_head(struct
>> buffer_head *bh)
>> J_ASSERT_JH(jh, jh->b_transaction == NULL);
>> J_ASSERT_JH(jh, jh->b_next_transaction == NULL);
>> J_ASSERT_JH(jh, jh->b_cp_transaction == NULL);
>> - J_ASSERT_JH(jh, jh->b_jlist == BJ_None);
>> + /*J_ASSERT_JH(jh, jh->b_jlist == BJ_None);*/
>> J_ASSERT_BH(bh, buffer_jbd(bh));
>> J_ASSERT_BH(bh, jh2bh(jh) == bh);
>> BUFFER_TRACE(bh, "remove journal_head");
>>
>> Really thanks,
>> - Zheng
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
On Sun 12-05-13 13:04:59, EUNBONG SONG wrote:
>
>
> >> Since at this point it's safer to rollback the change and we can
> >> investigate more deeply how to fix it correctly for the next
> >> development cycle, this is the patch which I'm testing.
>
> >> - Ted
>
> > Hello, I've tested with your patch. But the same problem was reproduced.
> > Currently, I'm trying to git bisect. If i done git bisect, i will let you know.
>
> Hi, I've done git bisect. and panic at jbd2_journal_put_journal_head() is caused by
> ae4647fb7654676fc44a97e86eb35f9f06b99f66: "jbd2: reduce journal_head size."
> I write just code patch which revert ae4647fb7654676fc44a97e86eb35f9f06b99f66 because
> I don't know the root cause.
This is really strange. I've verified the code and all the places
modifying b_jlist or b_modified are holding bh_state lock so we should be
safe...
Hum, but I remember I was debugging similar problems with bit fields in
btrfs on ia64 as well (see http://lwn.net/Articles/478657/). So I think what
has happened is that your compiler compiled bitfield access as 64-bit and
updates to b_jcount and b_jlist / b_modified crashed into one another. I
didn't hit it because my compiler was not so "clever".
Grumble. In this case I think bitfields are not worth the trouble with gcc.
It's a pitty we have to spend additional 8 bytes for every journal_head but
we'll survive... I'll send Ted a partial revert and add a comment so that
we won't repeat this mistake in future.
Honza
> Signed-off-by: Eunbong Song <[email protected]>
> ---
> include/linux/journal-head.h | 11 +++++++++--
> 1 files changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/journal-head.h b/include/linux/journal-head.h
> index 13a3da2..c18b46f 100644
> --- a/include/linux/journal-head.h
> +++ b/include/linux/journal-head.h
> @@ -31,14 +31,21 @@ struct journal_head {
> /*
> * Journalling list for this buffer [jbd_lock_bh_state()]
> */
> - unsigned b_jlist:4;
> + unsigned b_jlist;
>
> /*
> * This flag signals the buffer has been modified by
> * the currently running transaction
> * [jbd_lock_bh_state()]
> */
> - unsigned b_modified:1;
> + unsigned b_modified;
> +
> + /*
> + * This feild tracks the last transaction id in which this buffer
> + * has been cowed
> + * [jbd_lock_bh_state()]
> + */
> + tid_t b_cow_tid;
>
> /*
> * Copy of the buffer data frozen for writing to the log.
> --
> 1.7.0.4
>
>
> Thanks.
--
Jan Kara <[email protected]>
SUSE Labs, CR
On Mon, May 13, 2013 at 03:18:09PM +0200, Jan Kara wrote:
> Grumble. In this case I think bitfields are not worth the trouble with gcc.
> It's a pitty we have to spend additional 8 bytes for every journal_head but
> we'll survive... I'll send Ted a partial revert and add a comment so that
> we won't repeat this mistake in future.
Or just switch things to use explicit 32-bit boolean operations.
Sounds the safest way to go is to simply not trust bitfields to be
something gcc is competent to compile correctly, and just open code it
in standard C. (Large portions of ext4 and e2fsprogs do this
manually, for historical reasons, and it sounds like we have a good
reason to do it going forward.)
Jan, Dmitry --- I still have in my tree a revert for commit 4eec708d2:
ext4: use io_end for multiple bios, since I belive Dmitry still
bisected a regression for xfstests 299. Dmitry, can you confirm that
you are definitely seeing a regression here? Jan, do you mind if we
try to figure out how to fix this during the next development cycle,
since it was part of your much longer, extensive patch series anyway?
I've determined that the reason why I didn't see a problem was because
xfstests 299 was failing earlier on the baseline, and crashing my
regression tests. So I simply commented it out just so I could
complete the testing. It seems that xfstests 299 is problematic for
me, and I need to focus on how to make it pass successfully. (Dmitry,
when I revert the commit which you identified, xfstests 299 is *still*
failing for me....)
- Ted
On Mon 13-05-13 09:30:36, Ted Tso wrote:
> On Mon, May 13, 2013 at 03:18:09PM +0200, Jan Kara wrote:
> > Grumble. In this case I think bitfields are not worth the trouble with gcc.
> > It's a pitty we have to spend additional 8 bytes for every journal_head but
> > we'll survive... I'll send Ted a partial revert and add a comment so that
> > we won't repeat this mistake in future.
>
> Or just switch things to use explicit 32-bit boolean operations.
> Sounds the safest way to go is to simply not trust bitfields to be
> something gcc is competent to compile correctly, and just open code it
> in standard C. (Large portions of ext4 and e2fsprogs do this
> manually, for historical reasons, and it sounds like we have a good
> reason to do it going forward.)
Yeah, but in this case b_jlist testing / setting would require accessor
functions which is slightly ugly. For now I just submitted a revert of the
bitfield part and if someone feels like saving 8 bytes in struct
journal_head is worth the hassle, then we can later go that route.
> Jan, Dmitry --- I still have in my tree a revert for commit 4eec708d2:
> ext4: use io_end for multiple bios, since I belive Dmitry still
> bisected a regression for xfstests 299. Dmitry, can you confirm that
> you are definitely seeing a regression here? Jan, do you mind if we
> try to figure out how to fix this during the next development cycle,
> since it was part of your much longer, extensive patch series anyway?
Yeah, for now just send a revert to Linus. I'll look into that failure
now but since I didn't hit the problem in my testing it may take a while.
Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR
On Mon, May 13, 2013 at 03:18:09PM +0200, Jan Kara wrote:
> On Sun 12-05-13 13:04:59, EUNBONG SONG wrote:
> >
> >
> > >> Since at this point it's safer to rollback the change and we can
> > >> investigate more deeply how to fix it correctly for the next
> > >> development cycle, this is the patch which I'm testing.
> >
> > >> - Ted
> >
> > > Hello, I've tested with your patch. But the same problem was reproduced.
> > > Currently, I'm trying to git bisect. If i done git bisect, i will let you know.
> >
> > Hi, I've done git bisect. and panic at jbd2_journal_put_journal_head() is caused by
> > ae4647fb7654676fc44a97e86eb35f9f06b99f66: "jbd2: reduce journal_head size."
> > I write just code patch which revert ae4647fb7654676fc44a97e86eb35f9f06b99f66 because
> > I don't know the root cause.
> This is really strange. I've verified the code and all the places
> modifying b_jlist or b_modified are holding bh_state lock so we should be
> safe...
Hi Jan,
Could you please take a look at this mail [1]. I don't think we hold
bh_state lock there.
1. http://www.spinics.net/lists/linux-ext4/msg38205.html
Thanks,
- Zheng
On Mon, 13 May 2013 09:30:36 -0400, Theodore Ts'o <[email protected]> wrote:
> On Mon, May 13, 2013 at 03:18:09PM +0200, Jan Kara wrote:
> > Grumble. In this case I think bitfields are not worth the trouble with gcc.
> > It's a pitty we have to spend additional 8 bytes for every journal_head but
> > we'll survive... I'll send Ted a partial revert and add a comment so that
> > we won't repeat this mistake in future.
>
> Or just switch things to use explicit 32-bit boolean operations.
> Sounds the safest way to go is to simply not trust bitfields to be
> something gcc is competent to compile correctly, and just open code it
> in standard C. (Large portions of ext4 and e2fsprogs do this
> manually, for historical reasons, and it sounds like we have a good
> reason to do it going forward.)
>
> Jan, Dmitry --- I still have in my tree a revert for commit 4eec708d2:
> ext4: use io_end for multiple bios, since I belive Dmitry still
> bisected a regression for xfstests 299. Dmitry, can you confirm that
> you are definitely seeing a regression here?
Yes, this patch provoke use-after-free which detected by slab sanity checks.
> Jan, do you mind if we
> try to figure out how to fix this during the next development cycle,
> since it was part of your much longer, extensive patch series anyway?
>
> I've determined that the reason why I didn't see a problem was because
> xfstests 299 was failing earlier on the baseline, and crashing my
> regression tests. So I simply commented it out just so I could
> complete the testing. It seems that xfstests 299 is problematic for
> me, and I need to focus on how to make it pass successfully. (Dmitry,
> when I revert the commit which you identified, xfstests 299 is *still*
> failing for me....)
In fact generic/299 always succeed for me, but it produce warning
WARNING: at fs/ext4/inode.c:3218 ext4_ext_direct_IO
and complains from slab debug. But it was missed because i've missed
this error in the logs and forget to check /proc/sys/kernel/tained.
>
> - Ted
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
On Mon, May 13, 2013 at 05:47:05PM +0400, Dmitry Monakhov wrote:
> In fact generic/299 always succeed for me, but it produce warning
> WARNING: at fs/ext4/inode.c:3218 ext4_ext_direct_IO
> and complains from slab debug. But it was missed because i've missed
> this error in the logs and forget to check /proc/sys/kernel/tained.
This is how it's failing for me....
generic/299 [23:06:27][ 109.243220] fio (3376) used greatest stack depth: 5240 bytes left
[ 109.252757] fio (3380) used greatest stack depth: 5140 bytes left
[ 109.307227] fio (3374) used greatest stack depth: 4944 bytes left
[23:08:10] [failed, exit status 1] - output mismatch (see /root/xfstests/results/generic/299.out.bad)
--- tests/generic/299.out 2013-04-05 21:41:17.000000000 -0400
+++ /root/xfstests/results/generic/299.out.bad 2013-05-11 23:08:10.835356876 -0400
@@ -3,3 +3,6 @@
Run fio with random aio-dio pattern
Start fallocate/truncate loop
+./common/rc: line 2055: 3335 Segmentation fault "$@" >> $seqres.full 2>&1
+failed: '/root/xfstests/bin/fio /tmp/3135-299.fio'
+(see /root/xfstests/results/generic/299.full for details)
...
(Run 'diff -u tests/generic/299.out /root/xfstests/results/generic/299.out.bad' to see the entire diff)
I haven't had a chance to investigate the core dump yet....
- Ted
On Mon, 13 May 2013 09:52:21 -0400, Theodore Ts'o <[email protected]> wrote:
> On Mon, May 13, 2013 at 05:47:05PM +0400, Dmitry Monakhov wrote:
> > In fact generic/299 always succeed for me, but it produce warning
> > WARNING: at fs/ext4/inode.c:3218 ext4_ext_direct_IO
> > and complains from slab debug. But it was missed because i've missed
> > this error in the logs and forget to check /proc/sys/kernel/tained.
>
> This is how it's failing for me....
Because you ask questions, but do not read answers :)
http://marc.info/?l=linux-ext4&m=136580060822252&w=2
http://marc.info/?l=linux-ext4&m=136610044500931&w=2
>
> generic/299 [23:06:27][ 109.243220] fio (3376) used greatest stack depth: 5240 bytes left
> [ 109.252757] fio (3380) used greatest stack depth: 5140 bytes left
> [ 109.307227] fio (3374) used greatest stack depth: 4944 bytes left
> [23:08:10] [failed, exit status 1] - output mismatch (see /root/xfstests/results/generic/299.out.bad)
> --- tests/generic/299.out 2013-04-05 21:41:17.000000000 -0400
> +++ /root/xfstests/results/generic/299.out.bad 2013-05-11 23:08:10.835356876 -0400
> @@ -3,3 +3,6 @@
> Run fio with random aio-dio pattern
>
> Start fallocate/truncate loop
> +./common/rc: line 2055: 3335 Segmentation fault "$@" >> $seqres.full 2>&1
> +failed: '/root/xfstests/bin/fio /tmp/3135-299.fio'
> +(see /root/xfstests/results/generic/299.full for details)
> ...
> (Run 'diff -u tests/generic/299.out /root/xfstests/results/generic/299.out.bad' to see the entire diff)
>
> I haven't had a chance to investigate the core dump yet....
>
> - Ted
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon 13-05-13 21:56:43, Zheng Liu wrote:
> On Mon, May 13, 2013 at 03:18:09PM +0200, Jan Kara wrote:
> > On Sun 12-05-13 13:04:59, EUNBONG SONG wrote:
> > >
> > >
> > > >> Since at this point it's safer to rollback the change and we can
> > > >> investigate more deeply how to fix it correctly for the next
> > > >> development cycle, this is the patch which I'm testing.
> > >
> > > >> - Ted
> > >
> > > > Hello, I've tested with your patch. But the same problem was reproduced.
> > > > Currently, I'm trying to git bisect. If i done git bisect, i will let you know.
> > >
> > > Hi, I've done git bisect. and panic at jbd2_journal_put_journal_head() is caused by
> > > ae4647fb7654676fc44a97e86eb35f9f06b99f66: "jbd2: reduce journal_head size."
> > > I write just code patch which revert ae4647fb7654676fc44a97e86eb35f9f06b99f66 because
> > > I don't know the root cause.
> > This is really strange. I've verified the code and all the places
> > modifying b_jlist or b_modified are holding bh_state lock so we should be
> > safe...
>
> Hi Jan,
>
> Could you please take a look at this mail [1]. I don't think we hold
> bh_state lock there.
>
> 1. http://www.spinics.net/lists/linux-ext4/msg38205.html
I'll also reply to that thread but: Yes, we don't hold bh_state lock when
reading b_jlist in that one case (that's why I wrote 'modify' and not just
'access' in my previous email). But that's really harmless since we don't
do any complex operations with b_jlist (only get & set) so we either see an
old value or a new one. And that particular use is going away anyway later
in my series.
Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR
On Mon, May 13, 2013 at 05:17:27PM +0200, Jan Kara wrote:
> On Mon 13-05-13 21:56:43, Zheng Liu wrote:
> > On Mon, May 13, 2013 at 03:18:09PM +0200, Jan Kara wrote:
> > > On Sun 12-05-13 13:04:59, EUNBONG SONG wrote:
> > > >
> > > >
> > > > >> Since at this point it's safer to rollback the change and we can
> > > > >> investigate more deeply how to fix it correctly for the next
> > > > >> development cycle, this is the patch which I'm testing.
> > > >
> > > > >> - Ted
> > > >
> > > > > Hello, I've tested with your patch. But the same problem was reproduced.
> > > > > Currently, I'm trying to git bisect. If i done git bisect, i will let you know.
> > > >
> > > > Hi, I've done git bisect. and panic at jbd2_journal_put_journal_head() is caused by
> > > > ae4647fb7654676fc44a97e86eb35f9f06b99f66: "jbd2: reduce journal_head size."
> > > > I write just code patch which revert ae4647fb7654676fc44a97e86eb35f9f06b99f66 because
> > > > I don't know the root cause.
> > > This is really strange. I've verified the code and all the places
> > > modifying b_jlist or b_modified are holding bh_state lock so we should be
> > > safe...
> >
> > Hi Jan,
> >
> > Could you please take a look at this mail [1]. I don't think we hold
> > bh_state lock there.
> >
> > 1. http://www.spinics.net/lists/linux-ext4/msg38205.html
> I'll also reply to that thread but: Yes, we don't hold bh_state lock when
> reading b_jlist in that one case (that's why I wrote 'modify' and not just
> 'access' in my previous email). But that's really harmless since we don't
> do any complex operations with b_jlist (only get & set) so we either see an
> old value or a new one. And that particular use is going away anyway later
> in my series.
I see. Thanks for your explanation.
- Zheng
On 5/12/13 4:01 AM, Dmitry Monakhov wrote:
> In fact '4eec70' are vexing because I have reviewed and tested this patch before
> it was marked as Review-by, but missed the bug. This is because xfstests
> was executed manually logs was full of warnings but tainted flag was not
> checked at the end.
Can you elaborate on this? What was logged, and is it something we could
try to pick up post-test in xfstests?
Thanks,
-Eric
On Mon 13-05-13 11:34:12, Eric Sandeen wrote:
> On 5/12/13 4:01 AM, Dmitry Monakhov wrote:
> > In fact '4eec70' are vexing because I have reviewed and tested this patch before
> > it was marked as Review-by, but missed the bug. This is because xfstests
> > was executed manually logs was full of warnings but tainted flag was not
> > checked at the end.
>
> Can you elaborate on this? What was logged, and is it something we could
> try to pick up post-test in xfstests?
Generally I think it might be useful if xfstests would fail / warn if
kernel became tainted during the test (e.g. due to WARN_ON or oops, or
something like that). It should be even relatively easy to implement
(just compare /proc/sys/kernel/tainted before and after each test).
Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR
On 5/13/13 12:01 PM, Jan Kara wrote:
> On Mon 13-05-13 11:34:12, Eric Sandeen wrote:
>> On 5/12/13 4:01 AM, Dmitry Monakhov wrote:
>>> In fact '4eec70' are vexing because I have reviewed and tested this patch before
>>> it was marked as Review-by, but missed the bug. This is because xfstests
>>> was executed manually logs was full of warnings but tainted flag was not
>>> checked at the end.
>>
>> Can you elaborate on this? What was logged, and is it something we could
>> try to pick up post-test in xfstests?
> Generally I think it might be useful if xfstests would fail / warn if
> kernel became tainted during the test (e.g. due to WARN_ON or oops, or
> something like that). It should be even relatively easy to implement
> (just compare /proc/sys/kernel/tainted before and after each test).
>
> Honza
>
Ah, right. That should be easy, I'll see if I can cook that up.
Thanks,
-Eric
On Mon, May 13, 2013 at 05:59:25PM +0400, Dmitry Monakhov wrote:
> > This is how it's failing for me....
> Because you ask questions, but do not read answers :)
> http://marc.info/?l=linux-ext4&m=136580060822252&w=2
> http://marc.info/?l=linux-ext4&m=136610044500931&w=2
Sorry, I thought I was running with the updated fio, but apparently I
was using an old xfstests.tar.gz tarball on my KVM image. Mea culpa....
- Ted
On Mon, 13 May 2013 12:09:22 -0500, Eric Sandeen <[email protected]> wrote:
> On 5/13/13 12:01 PM, Jan Kara wrote:
> > On Mon 13-05-13 11:34:12, Eric Sandeen wrote:
> >> On 5/12/13 4:01 AM, Dmitry Monakhov wrote:
> >>> In fact '4eec70' are vexing because I have reviewed and tested this patch before
> >>> it was marked as Review-by, but missed the bug. This is because xfstests
> >>> was executed manually logs was full of warnings but tainted flag was not
> >>> checked at the end.
> >>
> >> Can you elaborate on this? What was logged, and is it something we could
> >> try to pick up post-test in xfstests?
> > Generally I think it might be useful if xfstests would fail / warn if
> > kernel became tainted during the test (e.g. due to WARN_ON or oops, or
> > something like that). It should be even relatively easy to implement
> > (just compare /proc/sys/kernel/tainted before and after each test).
> >
> > Honza
> >
>
> Ah, right. That should be easy, I'll see if I can cook that up.
Also we can use abrt's kernel-oops handler to collect messages.
>
> Thanks,
> -Eric
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
On 5/14/13 2:11 AM, Dmitry Monakhov wrote:
> On Mon, 13 May 2013 12:09:22 -0500, Eric Sandeen <[email protected]> wrote:
>> On 5/13/13 12:01 PM, Jan Kara wrote:
>>> On Mon 13-05-13 11:34:12, Eric Sandeen wrote:
>>>> On 5/12/13 4:01 AM, Dmitry Monakhov wrote:
>>>>> In fact '4eec70' are vexing because I have reviewed and tested this patch before
>>>>> it was marked as Review-by, but missed the bug. This is because xfstests
>>>>> was executed manually logs was full of warnings but tainted flag was not
>>>>> checked at the end.
>>>>
>>>> Can you elaborate on this? What was logged, and is it something we could
>>>> try to pick up post-test in xfstests?
>>> Generally I think it might be useful if xfstests would fail / warn if
>>> kernel became tainted during the test (e.g. due to WARN_ON or oops, or
>>> something like that). It should be even relatively easy to implement
>>> (just compare /proc/sys/kernel/tainted before and after each test).
>>>
>>> Honza
>>>
>>
>> Ah, right. That should be easy, I'll see if I can cook that up.
> Also we can use abrt's kernel-oops handler to collect messages.
I sent a pretty simple patch to just check the sysctl to the xfs list
yesterday.
-Eric
On Sun 12-05-13 13:01:11, Dmitry Monakhov wrote:
> On Sat, 11 May 2013 19:05:59 -0400, Theodore Ts'o <[email protected]> wrote:
> > On Sat, May 11, 2013 at 03:00:53PM +0400, Dmitry Monakhov wrote:
> > > I've bisected ext4 related issue. It is appeared that it is pure ext4
> > > specific. Regression caused by following commit
> > > commit 4eec708d263f0ee10861d69251708a225b64cac7
> > > Author: Jan Kara <[email protected]>
> > > Date: Thu Apr 11 23:56:53 2013 -0400
> > > ext4: use io_end for multiple bios
> >
> > Hmm... the next question is why did I see this in my testing. Did you
> > find this on ia64, or x86?
> No simple x86_64.
> > Also what about the slab corruption which
> > you saw when running XFS; was that unrelated?
> My theory about mysterious corruption in mm layer which broke everything
> was wrong. We have to absolutely independent regressions in different
> filesystems:
> * Slub corruption on XFS
> - testcase: xfstests/generic/007
> - bad commit: 666d64
> - LINK: https://lkml.org/lkml/2013/5/11/154
>
> * Slub corruption on EXT4
> - testcase: xfstests/generic/299
> - bad commit: 4eec70
> - link: https://lkml.org/lkml/2013/5/11/37
> - fix: https://lkml.org/lkml/2013/5/11/142
>
> In fact '4eec70' are vexing because I have reviewed and tested this patch
> before it was marked as Review-by, but missed the bug. This is because
> xfstests was executed manually logs was full of warnings but tainted flag
> was not checked at the end. To prevent this thing in future I'll always
> use my local autotest(http://autotest.github.io/) farm next time.
OK, so I finally nailed this down. DIO code has that peculiarity that it
doesn't call ->end_io callback when no IO was actually submitted. As a
bonus the generic code does a cleanup of generic stuff that is otherwise left
to ->end_io callback. Since I need to do io_end cleanup in ->end_io callback
I have to compensate for that in ext4_ext_direct_IO(). Sadly I've got the
condition wrong and also forgot that generic code already decremented
i_dio_count in the error failure case so cleanup sometimes happened twice.
I'll add fixed version of the patch back in my series (since the patch got
reverted upstream).
Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR