2009-03-04 05:09:21

by bugme-daemon

[permalink] [raw]
Subject: [Bug 12815] New: JBD: barrier-based sync failed on dm-1:8 - disabling barriers -- and then hang

http://bugzilla.kernel.org/show_bug.cgi?id=12815

Summary: JBD: barrier-based sync failed on dm-1:8 - disabling
barriers -- and then hang
Product: File System
Version: 2.5
KernelVersion: 2.6.28
Platform: All
OS/Version: Linux
Tree: Mainline
Status: NEW
Severity: normal
Priority: P1
Component: ext4
AssignedTo: [email protected]
ReportedBy: [email protected]


Latest working kernel version: none
Earliest failing kernel version: 2.6.28.1
Distribution: Debian unstable
Hardware Environment: Thecus N2100 (arm), two 1 TB SATA drives
Software Environment: /home is an ext4 filesystem on an LVM of the two drives
Problem Description:

This has happened at least three times since I converted the filesystem from
ext3 to ext4 two days ago. All processes accessing the filesystem go into
uninterruptable sleep. The system does not crash. On one occasion, the problem
cleared itself up after a few minutes and processes resumed running. On the
other two, I hard-reset the machine after 8 hours, and half an hour,
respectively.

The only thing in the log is this:

JBD: barrier-based sync failed on dm-1:8 - disabling barriers

It was never logged in the month before switching to ext4, and has been logged
seven times in the past two days. I have not matched up all seven times
exactly with the hangs, but some of them match up very well.

Steps to reproduce:

Unsure, but heavy disk load does not seem to help; at one point rtorrent was
hashing a large file when the hang occurred.

This may be a duplicate of bug 12679 -- at least, the symtoms as described are
the same, and the backtrace obtained there shows a hang in
jbd2_journal_commit_transaction, which calls the function from which my message
originates.


I am running a Debian kernel, not mainline. Apologies, but I don't have a head
on this machine.


--
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


2009-03-04 05:30:01

by bugme-daemon

[permalink] [raw]
Subject: [Bug 12815] JBD: barrier-based sync failed on dm-1:8 - disabling barriers -- and then hang

http://bugzilla.kernel.org/show_bug.cgi?id=12815





------- Comment #1 from [email protected] 2009-03-03 21:29 -------
ext4 has barriers on by default, so you'll see the disabling barriers message
on lvm every time you mount (it does not support barriers).

if you'd like to rule that out, mount with -o barrier=0

When you're hung, try echo w > /proc/sysrq-trigger (or SysRq-W) to get a list
of sleeping processes.

-Eric


--
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

2009-03-04 05:33:35

by bugme-daemon

[permalink] [raw]
Subject: [Bug 12815] JBD: barrier-based sync failed on dm-1:8 - disabling barriers -- and then hang

http://bugzilla.kernel.org/show_bug.cgi?id=12815





------- Comment #2 from [email protected] 2009-03-03 21:33 -------
Depending on what you see from sysrq, this is probably fixed by:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=2acf2c261b823d9d9ed954f348b97620297a36b5

which may make it to .28.y eventually...


--
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

2009-03-04 17:04:52

by bugme-daemon

[permalink] [raw]
Subject: [Bug 12815] JBD: barrier-based sync failed on dm-1:8 - disabling barriers -- and then hang

http://bugzilla.kernel.org/show_bug.cgi?id=12815





------- Comment #3 from [email protected] 2009-03-04 09:04 -------
I'm getting the JBD message not only on initial mount. Is that still expected?

I'll try to get some info from sysrq if it happens again.

Is there any mount option I can use to disable delayed allocation or something
to try to work around the problem?


--
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

2009-03-04 17:15:34

by bugme-daemon

[permalink] [raw]
Subject: [Bug 12815] JBD: barrier-based sync failed on dm-1:8 - disabling barriers -- and then hang

http://bugzilla.kernel.org/show_bug.cgi?id=12815





------- Comment #4 from [email protected] 2009-03-04 09:15 -------
(In reply to comment #3)
> I'm getting the JBD message not only on initial mount. Is that still expected?

You should get it on each mount. When else do you see it?

> I'll try to get some info from sysrq if it happens again.
>
> Is there any mount option I can use to disable delayed allocation or something
> to try to work around the problem?

You can mount with -o nodelalloc, though there is no reason to think that this
is related to the problem at this point ... it may be more productive to test a
newer kernel.


--
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

2009-03-04 17:41:20

by bugme-daemon

[permalink] [raw]
Subject: [Bug 12815] JBD: barrier-based sync failed on dm-1:8 - disabling barriers -- and then hang

http://bugzilla.kernel.org/show_bug.cgi?id=12815





------- Comment #5 from [email protected] 2009-03-04 09:40 -------
> You should get it on each mount. When else do you see it?

I do see the jbd message on each mount. But I also saw one 2.5 hours after
mount. But, probably a red herring.


--
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

2009-03-07 17:08:20

by bugme-daemon

[permalink] [raw]
Subject: [Bug 12815] JBD: barrier-based sync failed on dm-1:8 - disabling barriers -- and then hang

http://bugzilla.kernel.org/show_bug.cgi?id=12815





------- Comment #6 from [email protected] 2009-03-07 09:08 -------
I added barrier=0 and did not see the problem again until today. W/o JBD
messages, so that was a red herring. This time there was nothing special in
dmesg.

sysrq shows the following. This is not all the hung processes, but should be
representative:

Mar 7 12:00:54 turtle kernel: [82352.940000] sh D c0213428 0
21596 21594
Mar 7 12:00:54 turtle kernel: [82352.940000] [<c02130b4>] (schedule+0x0/0x3d0)
from [<c0213b94>] (__mutex_lock_slowpath+0x6c/0x98)
Mar 7 12:00:54 turtle kernel: [82352.940000] [<c0213b28>]
(__mutex_lock_slowpath+0x0/0x98) from [<c0213be0>] (mutex_lock+0x20/0x24)
Mar 7 12:00:54 turtle kernel: [82352.940000] r8:c76d379c r7:c223de98
r6:c76d3728 r5:c7685d98 r4:00000000
Mar 7 12:00:54 turtle kernel: [82352.940000] [<c0213bc0>]
(mutex_lock+0x0/0x24) from [<c009e41c>] (do_lookup+0x78/0x194)
Mar 7 12:00:54 turtle kernel: [82352.940000] [<c009e3a4>]
(do_lookup+0x0/0x194) from [<c00a0198>] (__link_path_walk+0x3ac/0xe24)
Mar 7 12:00:54 turtle kernel: [82352.940000] [<c009fdec>]
(__link_path_walk+0x0/0xe24) from [<c00a0d98>] (path_walk+0x50/0xa0)
Mar 7 12:00:54 turtle kernel: [82352.940000] [<c00a0d48>] (path_walk+0x0/0xa0)
from [<c00a0edc>] (do_path_lookup+0xf4/0x11c)
Mar 7 12:00:54 turtle kernel: [82352.940000] r6:ffffff9c r5:c223de98
r4:00000001
Mar 7 12:00:54 turtle kernel: [82352.940000] [<c00a0de8>]
(do_path_lookup+0x0/0x11c) from [<c00a198c>] (user_path_at+0x5c/0x9c)
Mar 7 12:00:54 turtle kernel: [82352.940000] r7:c223df08 r6:ffffff9c
r5:00000001 r4:c68d9000
Mar 7 12:00:54 turtle kernel: [82352.940000] [<c00a1930>]
(user_path_at+0x0/0x9c) from [<c009a354>] (vfs_stat_fd+0x24/0x54)
Mar 7 12:00:54 turtle kernel: [82352.940000] r7:000000c3 r6:c223df08
r5:c223df40 r4:bed3da18
Mar 7 12:00:54 turtle kernel: [82352.940000] [<c009a330>]
(vfs_stat_fd+0x0/0x54) from [<c009a438>] (vfs_stat+0x1c/0x20)
Mar 7 12:00:54 turtle kernel: [82352.940000] r6:000c5668 r5:c223df40
r4:bed3da18
Mar 7 12:00:54 turtle kernel: [82352.940000] [<c009a41c>] (vfs_stat+0x0/0x20)
from [<c009a45c>] (sys_stat64+0x20/0x3c)
Mar 7 12:00:54 turtle kernel: [82352.940000] [<c009a43c>]
(sys_stat64+0x0/0x3c) from [<c0025e00>] (ret_fast_syscall+0x0/0x3c)
Mar 7 12:00:54 turtle kernel: [82352.940000] r5:000bd6f8 r4:000c5668
Mar 7 12:00:54 turtle kernel: [82352.940000] sh D c0213428 0
22291 22289
Mar 7 12:00:54 turtle kernel: [82352.940000] [<c02130b4>] (schedule+0x0/0x3d0)
from [<c0213b94>] (__mutex_lock_slowpath+0x6c/0x98)
Mar 7 12:00:54 turtle kernel: [82352.940000] [<c0213b28>]
(__mutex_lock_slowpath+0x0/0x98) from [<c0213be0>] (mutex_lock+0x20/0x24)
Mar 7 12:00:54 turtle kernel: [82352.940000] r8:c76d379c r7:c4079e98
r6:c76d3728 r5:c7685d98 r4:00000000
Mar 7 12:00:54 turtle kernel: [82352.940000] [<c0213bc0>]
(mutex_lock+0x0/0x24) from [<c009e41c>] (do_lookup+0x78/0x194)
Mar 7 12:00:54 turtle kernel: [82352.940000] [<c009e3a4>]
(do_lookup+0x0/0x194) from [<c00a0198>] (__link_path_walk+0x3ac/0xe24)
Mar 7 12:00:54 turtle kernel: [82352.940000] [<c009fdec>]
(__link_path_walk+0x0/0xe24) from [<c00a0d98>] (path_walk+0x50/0xa0)
Mar 7 12:00:54 turtle kernel: [82352.940000] [<c00a0d48>] (path_walk+0x0/0xa0)
from [<c00a0edc>] (do_path_lookup+0xf4/0x11c)
Mar 7 12:00:54 turtle kernel: [82352.940000] r6:ffffff9c r5:c4079e98
r4:00000001
Mar 7 12:00:54 turtle kernel: [82352.940000] [<c00a0de8>]
(do_path_lookup+0x0/0x11c) from [<c00a198c>] (user_path_at+0x5c/0x9c)
Mar 7 12:00:54 turtle kernel: [82352.940000] r7:c4079f08 r6:ffffff9c
r5:00000001 r4:c6841000
Mar 7 12:00:54 turtle kernel: [82352.940000] [<c00a1930>]
(user_path_at+0x0/0x9c) from [<c009a354>] (vfs_stat_fd+0x24/0x54)
Mar 7 12:00:54 turtle kernel: [82352.940000] r7:000000c3 r6:c4079f08
r5:c4079f40 r4:be967a18
Mar 7 12:00:54 turtle kernel: [82352.940000] [<c009a330>]
(vfs_stat_fd+0x0/0x54) from [<c009a438>] (vfs_stat+0x1c/0x20)
Mar 7 12:00:54 turtle kernel: [82352.940000] r6:000c5668 r5:c4079f40
r4:be967a18
Mar 7 12:00:54 turtle kernel: [82352.940000] [<c009a41c>] (vfs_stat+0x0/0x20)
from [<c009a45c>] (sys_stat64+0x20/0x3c)
Mar 7 12:00:54 turtle kernel: [82352.940000] [<c009a43c>]
(sys_stat64+0x0/0x3c) from [<c0025e00>] (ret_fast_syscall+0x0/0x3c)
Mar 7 12:00:54 turtle kernel: [82352.940000] r5:000bd6f8 r4:000c5668
Mar 7 12:00:54 turtle kernel: [82352.940000] sh D c0213428 0
24027 24025


--
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

2009-03-09 15:45:18

by bugme-daemon

[permalink] [raw]
Subject: [Bug 12815] JBD: barrier-based sync failed on dm-1:8 - disabling barriers -- and then hang

http://bugzilla.kernel.org/show_bug.cgi?id=12815





------- Comment #7 from [email protected] 2009-03-09 08:45 -------
please attach the entire output of the sysrq-w command, so that we can sort out
what's relevant. Attaching it rather than pasting it in will have the added
advantage of not wrapping the output :)

Your trace snippets don't show any ext4 code in those callchains. But they may
be waiting on some stuck ext4 process which you didn't include.

(the summary probably should be changed, as this has nothing to do w/
barriers...)

Thanks,
-Eric


--
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

2009-05-19 18:43:22

by bugzilla-daemon

[permalink] [raw]
Subject: [Bug 12815] JBD: barrier-based sync failed on dm-1:8 - disabling barriers -- and then hang

http://bugzilla.kernel.org/show_bug.cgi?id=12815


Theodore Tso <[email protected]> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |[email protected]
Regression|--- |No




--- Comment #8 from Theodore Tso <[email protected]> 2009-05-19 18:43:22 ---
Hi Joey,

Are you still seeing this problem? Can you reproduce it, especially on a
newer/more recent kernel?

--
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

2009-08-26 18:08:54

by bugzilla-daemon

[permalink] [raw]
Subject: [Bug 12815] JBD: barrier-based sync failed on dm-1:8 - disabling barriers -- and then hang

http://bugzilla.kernel.org/show_bug.cgi?id=12815


Valerie Aurora <[email protected]> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |[email protected]




--- Comment #9 from Valerie Aurora <[email protected]> 2009-08-26 18:08:54 ---
Can this bug be closed? No response from the submitter for 5 months, and it
does kind of smell like a hardware problem.

--
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

2010-01-13 21:13:05

by bugzilla-daemon

[permalink] [raw]
Subject: [Bug 12815] JBD: barrier-based sync failed on dm-1:8 - disabling barriers -- and then hang

http://bugzilla.kernel.org/show_bug.cgi?id=12815


Yan-Fa Li <[email protected]> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |[email protected]




--- Comment #10 from Yan-Fa Li <[email protected]> 2010-01-13 21:13:00 ---
Found this bug a google search. I'm running a 2.6.32 kernel, and recently
converted an ext3 partition running on soft raid 5 to ext4. Today I found this
in my dmesg:

[ 119.414297] JBD: barrier-based sync failed on md1-8 - disabling barriers

Could this be triggered by turning off write caching on the individual drives.
I use hdparm -W0 on all the RAID drives for improved data integrity. Is it
safe to turn back write caching back on with write barriers?

I created the fs using e2fsprogs-1.41.9 with default options mkfs.ext4:

ext4 = {
features =
has_journal,extents,huge_file,flex_bg,uninit_bg,dir_nlink,extra_isize
inode_size = 256
}

The device is a simple RAID5 running across 3 disks.

#cat /proc/mdstat
Personalities : [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md1 : active raid5 sdc1[2] sdb1[1] sda1[0]
162754304 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU]
bitmap: 0/156 pages [0KB], 256KB chunk

md10 : active raid5 sdc5[2] sdb5[1] sda5[0]
1302389248 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU]
bitmap: 1/156 pages [4KB], 2048KB chunk

md0 : active raid1 sdg2[0] sdd2[1]
17818560 blocks [2/2] [UU]
bitmap: 2/136 pages [8KB], 64KB chunk

unused devices: <none>

This is a plain RAID5, with ext4 running directly on the md device.
dumpe2fs:

Filesystem volume name: /home
Last mounted on: /home
Filesystem UUID: 02cb8e4a-d8cf-4e8d-80e7-2fa2eb309db1
Filesystem magic number: 0xEF53
Filesystem revision #: 1 (dynamic)
Filesystem features: has_journal ext_attr resize_inode dir_index filetype
needs_recovery extent flex_bg sparse_super large_file huge_file uninit_bg
dir_nlink extra_isize
Filesystem flags: signed_directory_hash
Default mount options: (none)
Filesystem state: clean
Errors behavior: Continue
Filesystem OS type: Linux
Inode count: 10174464
Block count: 40688576
Reserved block count: 2034428
Free blocks: 17097770
Free inodes: 9910627
First block: 0
Block size: 4096
Fragment size: 4096
Reserved GDT blocks: 1014
Blocks per group: 32768
Fragments per group: 32768
Inodes per group: 8192
Inode blocks per group: 512
Flex block group size: 16
Filesystem created: Tue Jan 12 22:44:15 2010
Last mount time: Wed Jan 13 00:26:54 2010
Last write time: Wed Jan 13 00:26:54 2010
Mount count: 3
Maximum mount count: 28
Last checked: Tue Jan 12 22:44:15 2010
Check interval: 15552000 (6 months)
Next check after: Sun Jul 11 23:44:15 2010
Lifetime writes: 90 GB
Reserved blocks uid: 0 (user root)
Reserved blocks gid: 0 (group root)
First inode: 11
Inode size: 256
Required extra isize: 28
Desired extra isize: 28
Journal inode: 8
Default directory hash: half_md4
Directory Hash Seed: ee5d2952-78e6-4884-bccf-e7c41411e38b
Journal backup: inode blocks
Journal size: 128M

--
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

2010-01-13 21:15:29

by bugzilla-daemon

[permalink] [raw]
Subject: [Bug 12815] JBD: barrier-based sync failed on dm-1:8 - disabling barriers -- and then hang

http://bugzilla.kernel.org/show_bug.cgi?id=12815


Eric Sandeen <[email protected]> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |[email protected]
AssignedTo|[email protected] |[email protected]
|g |




--
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.