2013-07-30 22:41:40

by Sage Weil

[permalink] [raw]
Subject: crash in __jbd2_journal_file_buffer

Hi everyone,

Hit this on 3.10. Is this a known issue?

Thanks-
sage


Stack traceback for pid 23944
0xffff8802206edeb0 23944 23840 1 2 R 0xffff8802206ee338
*ceph-osd
ffff88020bf17b78 0000000000000018 ffffffff81267398 ffff8802256d40c0
ffff88020b5a1230 ffff88020bf17bb8 ffffffff81325d9c ffff88020b5a1230
ffff8802256d40c0 ffff88020b5a1230 00000000698a8d24 ffff88020bf17be8
Call Trace:
[<ffffffff81267398>] ? __jbd2_journal_file_buffer+0x188/0x260
[<ffffffff81325d9c>] ? do_raw_spin_lock+0x10c/0x150
[<ffffffff81268118>] ? do_get_write_access+0x448/0x650
[<ffffffff81213caf>] ? ext4_read_inode_bitmap+0x9f/0x5f0
[<ffffffff81167849>] ? kmem_cache_alloc+0x39/0x160
[<ffffffff81268490>] ? jbd2_journal_get_write_access+0x30/0x50
[<ffffffff81244d43>] ? __ext4_journal_get_write_access+0x43/0x90
[<ffffffff812143d7>] ? ext4_free_inode+0x1d7/0x5d0
[<ffffffff8121e601>] ? ext4_evict_inode+0x341/0x4d0
[<ffffffff8121b268>] ? ext4_mark_inode_dirty+0x88/0x230
[<ffffffff8121e610>] ? ext4_evict_inode+0x350/0x4d0
[<ffffffff8118a098>] ? evict+0xb8/0x1c0
[<ffffffff8118a9d5>] ? iput+0x105/0x190
[<ffffffff8117d341>] ? do_unlinkat+0x201/0x270
[<ffffffff8131e9be>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[<ffffffff81180356>] ? SyS_unlink+0x16/0x20



2013-07-31 19:02:48

by Jan Kara

[permalink] [raw]
Subject: Re: crash in __jbd2_journal_file_buffer

Hello,

On Tue 30-07-13 15:41:40, Sage Weil wrote:
> Hit this on 3.10. Is this a known issue?
No, I haven't seen it. Why did the kernel crash?

Honza
>
> Thanks-
> sage
>
>
> Stack traceback for pid 23944
> 0xffff8802206edeb0 23944 23840 1 2 R 0xffff8802206ee338
> *ceph-osd
> ffff88020bf17b78 0000000000000018 ffffffff81267398 ffff8802256d40c0
> ffff88020b5a1230 ffff88020bf17bb8 ffffffff81325d9c ffff88020b5a1230
> ffff8802256d40c0 ffff88020b5a1230 00000000698a8d24 ffff88020bf17be8
> Call Trace:
> [<ffffffff81267398>] ? __jbd2_journal_file_buffer+0x188/0x260
> [<ffffffff81325d9c>] ? do_raw_spin_lock+0x10c/0x150
> [<ffffffff81268118>] ? do_get_write_access+0x448/0x650
> [<ffffffff81213caf>] ? ext4_read_inode_bitmap+0x9f/0x5f0
> [<ffffffff81167849>] ? kmem_cache_alloc+0x39/0x160
> [<ffffffff81268490>] ? jbd2_journal_get_write_access+0x30/0x50
> [<ffffffff81244d43>] ? __ext4_journal_get_write_access+0x43/0x90
> [<ffffffff812143d7>] ? ext4_free_inode+0x1d7/0x5d0
> [<ffffffff8121e601>] ? ext4_evict_inode+0x341/0x4d0
> [<ffffffff8121b268>] ? ext4_mark_inode_dirty+0x88/0x230
> [<ffffffff8121e610>] ? ext4_evict_inode+0x350/0x4d0
> [<ffffffff8118a098>] ? evict+0xb8/0x1c0
> [<ffffffff8118a9d5>] ? iput+0x105/0x190
> [<ffffffff8117d341>] ? do_unlinkat+0x201/0x270
> [<ffffffff8131e9be>] ? trace_hardirqs_on_thunk+0x3a/0x3f
> [<ffffffff81180356>] ? SyS_unlink+0x16/0x20
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Jan Kara <[email protected]>
SUSE Labs, CR

2013-08-09 17:36:38

by Sage Weil

[permalink] [raw]
Subject: Re: crash in __jbd2_journal_file_buffer

Hi Jan,

Sorry for the slow response; took a while for this to happen again. This
time I'm keeping the machine sitting in kdb in case there is more
information needed.

<4>[19307.314449] ------------[ cut here ]------------
<4>[19307.319114] WARNING: at /srv/autobuild-ceph/gitbuilder.git/build/fs/jbd2/transaction.c:1237 jbd2_journal_dirty_metadata+0x1ba/0x260()

<4>[19307.382324] CPU: 0 PID: 8877 Comm: ceph-osd Tainted: G W 3.10.0-ceph-00049-g68d04c9 #1
<4>[19307.391256] Hardware name: Dell Inc. PowerEdge R410/01V648, BIOS 1.6.3 02/07/2011
<4>[19307.398879] ffffffff81a1d3c8 ffff880214469928 ffffffff816311b0 ffff880214469968
<4>[19307.407572] ffffffff8103fae0 ffff880214469958 ffff880170a9dc30 ffff8802240fbe80
<4>[19307.415153] 0000000000000000 ffff88020b366000 ffff8802256e7510 ffff880214469978
<4>[19307.422633] Call Trace:
<4>[19307.425209] [<ffffffff816311b0>] dump_stack+0x19/0x1b
<4>[19307.430368] [<ffffffff8103fae0>] warn_slowpath_common+0x70/0xa0
<4>[19307.436502] [<ffffffff8103fb2a>] warn_slowpath_null+0x1a/0x20
<4>[19307.442356] [<ffffffff81267c2a>] jbd2_journal_dirty_metadata+0x1ba/0x260
<4>[19307.449271] [<ffffffff81245093>] __ext4_handle_dirty_metadata+0xa3/0x140
<4>[19307.456192] [<ffffffff812561f3>] ext4_xattr_release_block+0x103/0x1f0
<4>[19307.462742] [<ffffffff81256680>] ext4_xattr_block_set+0x1e0/0x910
<4>[19307.469049] [<ffffffff8125795b>] ext4_xattr_set_handle+0x38b/0x4a0
<4>[19307.475445] [<ffffffff810a319d>] ? trace_hardirqs_on+0xd/0x10
<4>[19307.481300] [<ffffffff81257b32>] ext4_xattr_set+0xc2/0x140
<4>[19307.486995] [<ffffffff81258547>] ext4_xattr_user_set+0x47/0x50
<4>[19307.492936] [<ffffffff811935ce>] generic_setxattr+0x6e/0x90
<4>[19307.498734] [<ffffffff81193ecb>] __vfs_setxattr_noperm+0x7b/0x1c0
<4>[19307.505049] [<ffffffff811940d4>] vfs_setxattr+0xc4/0xd0
<4>[19307.510380] [<ffffffff8119421e>] setxattr+0x13e/0x1e0
<4>[19307.515646] [<ffffffff811719c7>] ? __sb_start_write+0xe7/0x1b0
<4>[19307.521587] [<ffffffff8118f2e8>] ? mnt_want_write_file+0x28/0x60
<4>[19307.527812] [<ffffffff8118c65c>] ? fget_light+0x3c/0x130
<4>[19307.533230] [<ffffffff8118f2e8>] ? mnt_want_write_file+0x28/0x60
<4>[19307.539522] [<ffffffff8118f1f8>] ? __mnt_want_write+0x58/0x70
<4>[19307.545492] [<ffffffff811946be>] SyS_fsetxattr+0xbe/0x100
<4>[19307.551001] [<ffffffff816407c2>] system_call_fastpath+0x16/0x1b
<4>[19307.557137] ---[ end trace 3e447f9462172c58 ]---
<2>[19307.561776] EXT4-fs error (device sda1) in ext4_handle_dirty_xattr_block:167: error 117
<3>[19307.570181] Aborting journal on device sda1-8.
<2>[19307.604589] EXT4-fs (sda1): Remounting filesystem read-only
<2>[19307.610273] EXT4-fs error (device sda1) in ext4_xattr_release_block:558: error 117
<0>[19307.623337] journal commit I/O error
<0>[19307.623342] journal commit I/O error
<0>[19307.623405] journal commit I/O error
<0>[19307.623519] journal commit I/O error
<2>[19307.623585] EXT4-fs error (device sda1): __ext4_journal_start_sb:62: Detected aborted journal
<1>[19307.623642] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
<1>[19307.623649] IP: [<ffffffff81267c4a>] jbd2_journal_dirty_metadata+0x1da/0x260
<4>[19307.623653] PGD 20bc32067 PUD 2245fc067 PMD 0
<4>[19307.623657] Oops: 0000 [#1] SMP
[dumpcommon]kdb> -bt

Stack traceback for pid 8877
0xffff88020db7bf20 8877 8795 1 4 R 0xffff88020db7c3a8 *ceph-osd
ffff880214469a08 0000000000000018 ffffffff81267ad0 ffffea000834d600
ffff880226c03300 ffffffff81256562 0000000000000282 ffff880214469a68
0000000000000000 00000000000011a5 ffffffff81825270 ffff8802256e7510
Call Trace:
[<ffffffff81267ad0>] ? jbd2_journal_dirty_metadata+0x60/0x260
[<ffffffff81256562>] ? ext4_xattr_block_set+0xc2/0x910
[<ffffffff81245093>] ? __ext4_handle_dirty_metadata+0xa3/0x140
[<ffffffff8121aeee>] ? ext4_mark_iloc_dirty+0x40e/0x660
[<ffffffff81257835>] ? ext4_xattr_set_handle+0x265/0x4a0
[<ffffffff81257b32>] ? ext4_xattr_set+0xc2/0x140
[<ffffffff81258547>] ? ext4_xattr_user_set+0x47/0x50
[<ffffffff811935ce>] ? generic_setxattr+0x6e/0x90
[<ffffffff81193ecb>] ? __vfs_setxattr_noperm+0x7b/0x1c0
[<ffffffff811940d4>] ? vfs_setxattr+0xc4/0xd0
[<ffffffff8119421e>] ? setxattr+0x13e/0x1e0
[<ffffffff811719c7>] ? __sb_start_write+0xe7/0x1b0
[<ffffffff8118f2e8>] ? mnt_want_write_file+0x28/0x60
[<ffffffff8118c65c>] ? fget_light+0x3c/0x130
[<ffffffff8118f2e8>] ? mnt_want_write_file+0x28/0x60
[<ffffffff8118f1f8>] ? __mnt_want_write+0x58/0x70
[<ffffffff811946be>] ? SyS_fsetxattr+0xbe/0x100
[<ffffffff816407c2>] ? system_call_fastpath+0x16/0x1b


The workload is a ceph-osd daemon, which tends to hammer pretty heavily on
the xattr paths. I don't have a nice self-contained reproducer or
anything since this is is falling out of our integration tests.
Hopefully there is enough here (or that can be gleaned from kdb) that it
is clear what is going on. It's v3.10 (plus some unrelated patches).

Thanks!
sage



On Wed, 31 Jul 2013, Jan Kara wrote:
> Hello,
>
> On Tue 30-07-13 15:41:40, Sage Weil wrote:
> > Hit this on 3.10. Is this a known issue?
> No, I haven't seen it. Why did the kernel crash?
>
> Honza
> >
> > Thanks-
> > sage
> >
> >
> > Stack traceback for pid 23944
> > 0xffff8802206edeb0 23944 23840 1 2 R 0xffff8802206ee338
> > *ceph-osd
> > ffff88020bf17b78 0000000000000018 ffffffff81267398 ffff8802256d40c0
> > ffff88020b5a1230 ffff88020bf17bb8 ffffffff81325d9c ffff88020b5a1230
> > ffff8802256d40c0 ffff88020b5a1230 00000000698a8d24 ffff88020bf17be8
> > Call Trace:
> > [<ffffffff81267398>] ? __jbd2_journal_file_buffer+0x188/0x260
> > [<ffffffff81325d9c>] ? do_raw_spin_lock+0x10c/0x150
> > [<ffffffff81268118>] ? do_get_write_access+0x448/0x650
> > [<ffffffff81213caf>] ? ext4_read_inode_bitmap+0x9f/0x5f0
> > [<ffffffff81167849>] ? kmem_cache_alloc+0x39/0x160
> > [<ffffffff81268490>] ? jbd2_journal_get_write_access+0x30/0x50
> > [<ffffffff81244d43>] ? __ext4_journal_get_write_access+0x43/0x90
> > [<ffffffff812143d7>] ? ext4_free_inode+0x1d7/0x5d0
> > [<ffffffff8121e601>] ? ext4_evict_inode+0x341/0x4d0
> > [<ffffffff8121b268>] ? ext4_mark_inode_dirty+0x88/0x230
> > [<ffffffff8121e610>] ? ext4_evict_inode+0x350/0x4d0
> > [<ffffffff8118a098>] ? evict+0xb8/0x1c0
> > [<ffffffff8118a9d5>] ? iput+0x105/0x190
> > [<ffffffff8117d341>] ? do_unlinkat+0x201/0x270
> > [<ffffffff8131e9be>] ? trace_hardirqs_on_thunk+0x3a/0x3f
> > [<ffffffff81180356>] ? SyS_unlink+0x16/0x20
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> --
> Jan Kara <[email protected]>
> SUSE Labs, CR
>
>

2013-08-09 21:24:28

by Jan Kara

[permalink] [raw]
Subject: Re: crash in __jbd2_journal_file_buffer

On Fri 09-08-13 10:36:37, Sage Weil wrote:
> Hi Jan,
>
> Sorry for the slow response; took a while for this to happen again. This
> time I'm keeping the machine sitting in kdb in case there is more
> information needed.
>
> <4>[19307.314449] ------------[ cut here ]------------
> <4>[19307.319114] WARNING: at /srv/autobuild-ceph/gitbuilder.git/build/fs/jbd2/transaction.c:1237 jbd2_journal_dirty_metadata+0x1ba/0x260()
>
> <4>[19307.382324] CPU: 0 PID: 8877 Comm: ceph-osd Tainted: G W 3.10.0-ceph-00049-g68d04c9 #1
> <4>[19307.391256] Hardware name: Dell Inc. PowerEdge R410/01V648, BIOS 1.6.3 02/07/2011
> <4>[19307.398879] ffffffff81a1d3c8 ffff880214469928 ffffffff816311b0 ffff880214469968
> <4>[19307.407572] ffffffff8103fae0 ffff880214469958 ffff880170a9dc30 ffff8802240fbe80
> <4>[19307.415153] 0000000000000000 ffff88020b366000 ffff8802256e7510 ffff880214469978
> <4>[19307.422633] Call Trace:
> <4>[19307.425209] [<ffffffff816311b0>] dump_stack+0x19/0x1b
> <4>[19307.430368] [<ffffffff8103fae0>] warn_slowpath_common+0x70/0xa0
> <4>[19307.436502] [<ffffffff8103fb2a>] warn_slowpath_null+0x1a/0x20
> <4>[19307.442356] [<ffffffff81267c2a>] jbd2_journal_dirty_metadata+0x1ba/0x260
> <4>[19307.449271] [<ffffffff81245093>] __ext4_handle_dirty_metadata+0xa3/0x140
> <4>[19307.456192] [<ffffffff812561f3>] ext4_xattr_release_block+0x103/0x1f0
> <4>[19307.462742] [<ffffffff81256680>] ext4_xattr_block_set+0x1e0/0x910
> <4>[19307.469049] [<ffffffff8125795b>] ext4_xattr_set_handle+0x38b/0x4a0
> <4>[19307.475445] [<ffffffff810a319d>] ? trace_hardirqs_on+0xd/0x10
> <4>[19307.481300] [<ffffffff81257b32>] ext4_xattr_set+0xc2/0x140
> <4>[19307.486995] [<ffffffff81258547>] ext4_xattr_user_set+0x47/0x50
> <4>[19307.492936] [<ffffffff811935ce>] generic_setxattr+0x6e/0x90
> <4>[19307.498734] [<ffffffff81193ecb>] __vfs_setxattr_noperm+0x7b/0x1c0
> <4>[19307.505049] [<ffffffff811940d4>] vfs_setxattr+0xc4/0xd0
> <4>[19307.510380] [<ffffffff8119421e>] setxattr+0x13e/0x1e0
> <4>[19307.515646] [<ffffffff811719c7>] ? __sb_start_write+0xe7/0x1b0
> <4>[19307.521587] [<ffffffff8118f2e8>] ? mnt_want_write_file+0x28/0x60
> <4>[19307.527812] [<ffffffff8118c65c>] ? fget_light+0x3c/0x130
> <4>[19307.533230] [<ffffffff8118f2e8>] ? mnt_want_write_file+0x28/0x60
> <4>[19307.539522] [<ffffffff8118f1f8>] ? __mnt_want_write+0x58/0x70
> <4>[19307.545492] [<ffffffff811946be>] SyS_fsetxattr+0xbe/0x100
> <4>[19307.551001] [<ffffffff816407c2>] system_call_fastpath+0x16/0x1b
> <4>[19307.557137] ---[ end trace 3e447f9462172c58 ]---
Hum, weird. So we returned EUCLEAN from jbd2_journal_dirty_metadata()
meaning that buffer head passed in there didn't have journal head attached.
However ext4_xattr_release_block() does call
ext4_journal_get_write_access() for the bh before calling
ext4_handle_dirty_xattr_block(). So we should have bh attached to the
running transaction which also keeps journal head in place.

> <2>[19307.561776] EXT4-fs error (device sda1) in ext4_handle_dirty_xattr_block:167: error 117
> <3>[19307.570181] Aborting journal on device sda1-8.
> <2>[19307.604589] EXT4-fs (sda1): Remounting filesystem read-only
> <2>[19307.610273] EXT4-fs error (device sda1) in ext4_xattr_release_block:558: error 117
> <0>[19307.623337] journal commit I/O error
> <0>[19307.623342] journal commit I/O error
> <0>[19307.623405] journal commit I/O error
> <0>[19307.623519] journal commit I/O error
> <2>[19307.623585] EXT4-fs error (device sda1): __ext4_journal_start_sb:62: Detected aborted journal
> <1>[19307.623642] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
> <1>[19307.623649] IP: [<ffffffff81267c4a>] jbd2_journal_dirty_metadata+0x1da/0x260
Can you possibly disassemble where exactly did we crash in this function?
Apparently we didn't properly handle the fact that the journal got aborted
here. But I'm not sure what exactly failed.

Honza

> <4>[19307.623653] PGD 20bc32067 PUD 2245fc067 PMD 0
> <4>[19307.623657] Oops: 0000 [#1] SMP
> [dumpcommon]kdb> -bt
>
> Stack traceback for pid 8877
> 0xffff88020db7bf20 8877 8795 1 4 R 0xffff88020db7c3a8 *ceph-osd
> ffff880214469a08 0000000000000018 ffffffff81267ad0 ffffea000834d600
> ffff880226c03300 ffffffff81256562 0000000000000282 ffff880214469a68
> 0000000000000000 00000000000011a5 ffffffff81825270 ffff8802256e7510
> Call Trace:
> [<ffffffff81267ad0>] ? jbd2_journal_dirty_metadata+0x60/0x260
> [<ffffffff81256562>] ? ext4_xattr_block_set+0xc2/0x910
> [<ffffffff81245093>] ? __ext4_handle_dirty_metadata+0xa3/0x140
> [<ffffffff8121aeee>] ? ext4_mark_iloc_dirty+0x40e/0x660
> [<ffffffff81257835>] ? ext4_xattr_set_handle+0x265/0x4a0
> [<ffffffff81257b32>] ? ext4_xattr_set+0xc2/0x140
> [<ffffffff81258547>] ? ext4_xattr_user_set+0x47/0x50
> [<ffffffff811935ce>] ? generic_setxattr+0x6e/0x90
> [<ffffffff81193ecb>] ? __vfs_setxattr_noperm+0x7b/0x1c0
> [<ffffffff811940d4>] ? vfs_setxattr+0xc4/0xd0
> [<ffffffff8119421e>] ? setxattr+0x13e/0x1e0
> [<ffffffff811719c7>] ? __sb_start_write+0xe7/0x1b0
> [<ffffffff8118f2e8>] ? mnt_want_write_file+0x28/0x60
> [<ffffffff8118c65c>] ? fget_light+0x3c/0x130
> [<ffffffff8118f2e8>] ? mnt_want_write_file+0x28/0x60
> [<ffffffff8118f1f8>] ? __mnt_want_write+0x58/0x70
> [<ffffffff811946be>] ? SyS_fsetxattr+0xbe/0x100
> [<ffffffff816407c2>] ? system_call_fastpath+0x16/0x1b
>
>
> The workload is a ceph-osd daemon, which tends to hammer pretty heavily on
> the xattr paths. I don't have a nice self-contained reproducer or
> anything since this is is falling out of our integration tests.
> Hopefully there is enough here (or that can be gleaned from kdb) that it
> is clear what is going on. It's v3.10 (plus some unrelated patches).
>
> Thanks!
> sage
>
>
>
> On Wed, 31 Jul 2013, Jan Kara wrote:
> > Hello,
> >
> > On Tue 30-07-13 15:41:40, Sage Weil wrote:
> > > Hit this on 3.10. Is this a known issue?
> > No, I haven't seen it. Why did the kernel crash?
> >
> > Honza
> > >
> > > Thanks-
> > > sage
> > >
> > >
> > > Stack traceback for pid 23944
> > > 0xffff8802206edeb0 23944 23840 1 2 R 0xffff8802206ee338
> > > *ceph-osd
> > > ffff88020bf17b78 0000000000000018 ffffffff81267398 ffff8802256d40c0
> > > ffff88020b5a1230 ffff88020bf17bb8 ffffffff81325d9c ffff88020b5a1230
> > > ffff8802256d40c0 ffff88020b5a1230 00000000698a8d24 ffff88020bf17be8
> > > Call Trace:
> > > [<ffffffff81267398>] ? __jbd2_journal_file_buffer+0x188/0x260
> > > [<ffffffff81325d9c>] ? do_raw_spin_lock+0x10c/0x150
> > > [<ffffffff81268118>] ? do_get_write_access+0x448/0x650
> > > [<ffffffff81213caf>] ? ext4_read_inode_bitmap+0x9f/0x5f0
> > > [<ffffffff81167849>] ? kmem_cache_alloc+0x39/0x160
> > > [<ffffffff81268490>] ? jbd2_journal_get_write_access+0x30/0x50
> > > [<ffffffff81244d43>] ? __ext4_journal_get_write_access+0x43/0x90
> > > [<ffffffff812143d7>] ? ext4_free_inode+0x1d7/0x5d0
> > > [<ffffffff8121e601>] ? ext4_evict_inode+0x341/0x4d0
> > > [<ffffffff8121b268>] ? ext4_mark_inode_dirty+0x88/0x230
> > > [<ffffffff8121e610>] ? ext4_evict_inode+0x350/0x4d0
> > > [<ffffffff8118a098>] ? evict+0xb8/0x1c0
> > > [<ffffffff8118a9d5>] ? iput+0x105/0x190
> > > [<ffffffff8117d341>] ? do_unlinkat+0x201/0x270
> > > [<ffffffff8131e9be>] ? trace_hardirqs_on_thunk+0x3a/0x3f
> > > [<ffffffff81180356>] ? SyS_unlink+0x16/0x20
> > >
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> > > the body of a message to [email protected]
> > > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > --
> > Jan Kara <[email protected]>
> > SUSE Labs, CR
> >
> >
--
Jan Kara <[email protected]>
SUSE Labs, CR

2013-08-09 22:11:07

by Sage Weil

[permalink] [raw]
Subject: Re: crash in __jbd2_journal_file_buffer

On Fri, 9 Aug 2013, Jan Kara wrote:
> On Fri 09-08-13 10:36:37, Sage Weil wrote:
> > Hi Jan,
> >
> > Sorry for the slow response; took a while for this to happen again. This
> > time I'm keeping the machine sitting in kdb in case there is more
> > information needed.
> >
> > <4>[19307.314449] ------------[ cut here ]------------
> > <4>[19307.319114] WARNING: at /srv/autobuild-ceph/gitbuilder.git/build/fs/jbd2/transaction.c:1237 jbd2_journal_dirty_metadata+0x1ba/0x260()
> >
> > <4>[19307.382324] CPU: 0 PID: 8877 Comm: ceph-osd Tainted: G W 3.10.0-ceph-00049-g68d04c9 #1
> > <4>[19307.391256] Hardware name: Dell Inc. PowerEdge R410/01V648, BIOS 1.6.3 02/07/2011
> > <4>[19307.398879] ffffffff81a1d3c8 ffff880214469928 ffffffff816311b0 ffff880214469968
> > <4>[19307.407572] ffffffff8103fae0 ffff880214469958 ffff880170a9dc30 ffff8802240fbe80
> > <4>[19307.415153] 0000000000000000 ffff88020b366000 ffff8802256e7510 ffff880214469978
> > <4>[19307.422633] Call Trace:
> > <4>[19307.425209] [<ffffffff816311b0>] dump_stack+0x19/0x1b
> > <4>[19307.430368] [<ffffffff8103fae0>] warn_slowpath_common+0x70/0xa0
> > <4>[19307.436502] [<ffffffff8103fb2a>] warn_slowpath_null+0x1a/0x20
> > <4>[19307.442356] [<ffffffff81267c2a>] jbd2_journal_dirty_metadata+0x1ba/0x260
> > <4>[19307.449271] [<ffffffff81245093>] __ext4_handle_dirty_metadata+0xa3/0x140
> > <4>[19307.456192] [<ffffffff812561f3>] ext4_xattr_release_block+0x103/0x1f0
> > <4>[19307.462742] [<ffffffff81256680>] ext4_xattr_block_set+0x1e0/0x910
> > <4>[19307.469049] [<ffffffff8125795b>] ext4_xattr_set_handle+0x38b/0x4a0
> > <4>[19307.475445] [<ffffffff810a319d>] ? trace_hardirqs_on+0xd/0x10
> > <4>[19307.481300] [<ffffffff81257b32>] ext4_xattr_set+0xc2/0x140
> > <4>[19307.486995] [<ffffffff81258547>] ext4_xattr_user_set+0x47/0x50
> > <4>[19307.492936] [<ffffffff811935ce>] generic_setxattr+0x6e/0x90
> > <4>[19307.498734] [<ffffffff81193ecb>] __vfs_setxattr_noperm+0x7b/0x1c0
> > <4>[19307.505049] [<ffffffff811940d4>] vfs_setxattr+0xc4/0xd0
> > <4>[19307.510380] [<ffffffff8119421e>] setxattr+0x13e/0x1e0
> > <4>[19307.515646] [<ffffffff811719c7>] ? __sb_start_write+0xe7/0x1b0
> > <4>[19307.521587] [<ffffffff8118f2e8>] ? mnt_want_write_file+0x28/0x60
> > <4>[19307.527812] [<ffffffff8118c65c>] ? fget_light+0x3c/0x130
> > <4>[19307.533230] [<ffffffff8118f2e8>] ? mnt_want_write_file+0x28/0x60
> > <4>[19307.539522] [<ffffffff8118f1f8>] ? __mnt_want_write+0x58/0x70
> > <4>[19307.545492] [<ffffffff811946be>] SyS_fsetxattr+0xbe/0x100
> > <4>[19307.551001] [<ffffffff816407c2>] system_call_fastpath+0x16/0x1b
> > <4>[19307.557137] ---[ end trace 3e447f9462172c58 ]---
> Hum, weird. So we returned EUCLEAN from jbd2_journal_dirty_metadata()
> meaning that buffer head passed in there didn't have journal head attached.
> However ext4_xattr_release_block() does call
> ext4_journal_get_write_access() for the bh before calling
> ext4_handle_dirty_xattr_block(). So we should have bh attached to the
> running transaction which also keeps journal head in place.
>
> > <2>[19307.561776] EXT4-fs error (device sda1) in ext4_handle_dirty_xattr_block:167: error 117
> > <3>[19307.570181] Aborting journal on device sda1-8.
> > <2>[19307.604589] EXT4-fs (sda1): Remounting filesystem read-only
> > <2>[19307.610273] EXT4-fs error (device sda1) in ext4_xattr_release_block:558: error 117
> > <0>[19307.623337] journal commit I/O error
> > <0>[19307.623342] journal commit I/O error
> > <0>[19307.623405] journal commit I/O error
> > <0>[19307.623519] journal commit I/O error
> > <2>[19307.623585] EXT4-fs error (device sda1): __ext4_journal_start_sb:62: Detected aborted journal
> > <1>[19307.623642] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
> > <1>[19307.623649] IP: [<ffffffff81267c4a>] jbd2_journal_dirty_metadata+0x1da/0x260
> Can you possibly disassemble where exactly did we crash in this function?
> Apparently we didn't properly handle the fact that the journal got aborted
> here. But I'm not sure what exactly failed.

Hrm, I'm having trouble getting from a bzImage back to something I can
objdump with inline code, so this is a separate compile with the same
.config that is hopefully right:

jh->b_modified = 1;
J_ASSERT_JH(jh, handle->h_buffer_credits > 0);
1e33: 0f 0b ud2
*/
if (jh->b_transaction != transaction) {
JBUFFER_TRACE(jh, "already on other transaction");
if (unlikely(jh->b_transaction !=
journal->j_committing_transaction)) {
printk(KERN_EMERG "JBD: %s: "
1e35: 4d 85 c9 test %r9,%r9
1e38: 74 04 je 1e3e <jbd2_journal_dirty_metadata+0x1de>
1e3a: 41 8b 41 08 mov 0x8(%r9),%eax
**here^^^
1e3e: 45 31 c0 xor %r8d,%r8d
1e41: 48 85 c9 test %rcx,%rcx
1e44: 74 04 je 1e4a <jbd2_journal_dirty_metadata+0x1ea>
1e46: 44 8b 41 08 mov 0x8(%rcx),%r8d
1e4a: 48 8b 53 18 mov 0x18(%rbx),%rdx
"jh->b_transaction (%llu, %p, %u) != "
"journal->j_committing_transaction (%p, %u)",
journal->j_devname,
1e4e: 49 8d b6 80 05 00 00 lea 0x580(%r14),%rsi
*/
if (jh->b_transaction != transaction) {
JBUFFER_TRACE(jh, "already on other transaction");
if (unlikely(jh->b_transaction !=
journal->j_committing_transaction)) {
printk(KERN_EMERG "JBD: %s: "
1e55: 89 04 24 mov %eax,(%rsp)
1e58: 48 c7 c7 00 00 00 00 mov $0x0,%rdi
1e5b: R_X86_64_32S .rodata.str1.8+0x160
1e5f: 31 c0 xor %eax,%eax
1e61: e8 00 00 00 00 callq 1e66 <jbd2_journal_dirty_metadata+0x206>
1e62: R_X86_64_PC32 printk-0x4
jh->b_transaction,
jh->b_transaction ? jh->b_transaction->t_tid : 0,
journal->j_committing_transaction,
journal->j_committing_transaction ?
journal->j_committing_transaction->t_tid : 0);
ret = -EINVAL;
1e66: b8 ea ff ff ff mov $0xffffffea,%eax
1e6b: e9 a4 fe ff ff jmpq 1d14 <jbd2_journal_dirty_metadata+0xb4>
}
if (unlikely(jh->b_next_transaction != transaction)) {
printk(KERN_EMERG "JBD: %s: "
1e70: 45 31 c0 xor %r8d,%r8d
1e73: 48 85 c9 test %rcx,%rcx
1e76: 41 8b 44 24 08 mov 0x8(%r12),%eax
1e7b: 74 04 je 1e81 <jbd2_journal_dirty_metadata+0x221>
1e7d: 44 8b 41 08 mov 0x8(%rcx),%r8d
1e81: 48 8b 53 18 mov 0x18(%rbx),%rdx
"jh->b_next_transaction (%llu, %p, %u) != "
"transaction (%p, %u)",
journal->j_devname,
1e85: 49 8d b6 80 05 00 00 lea 0x580(%r14),%rsi
journal->j_committing_transaction ?
journal->j_committing_transaction->t_tid : 0);
ret = -EINVAL;
}
if (unlikely(jh->b_next_transaction != transaction)) {
printk(KERN_EMERG "JBD: %s: "
1e8c: 89 04 24 mov %eax,(%rsp)
1e8f: 4d 89 e1 mov %r12,%r9
1e92: 48 c7 c7 00 00 00 00 mov $0x0,%rdi
1e95: R_X86_64_32S .rodata.str1.8+0x1c0
1e99: 31 c0 xor %eax,%eax
1e9b: e8 00 00 00 00 callq 1ea0 <jbd2_journal_dirty_metadata+0x240>
1e9c: R_X86_64_PC32 printk-0x4
(unsigned long long) bh->b_blocknr,
jh->b_next_transaction,
jh->b_next_transaction ?
jh->b_next_transaction->t_tid : 0,
transaction, transaction->t_tid);
ret = -EINVAL;
1ea0: b8 ea ff ff ff mov $0xffffffea,%eax
1ea5: e9 77 fe ff ff jmpq 1d21 <jbd2_journal_dirty_metadata+0xc1>
* bit-based spin_unlock()
*/
static inline void bit_spin_unlock(int bitnum, unsigned long *addr)
{
#ifdef CONFIG_DEBUG_SPINLOCK
BUG_ON(!test_bit(bitnum, addr));
1eaa: e8 00 00 00 00 callq 1eaf <jbd2_journal_dirty_metadata+0x24f>
1eab: R_X86_64_PC32 .text.unlikely-0x4
* transaction's data buffer, ever. */

Registers are

[4]kdb> rd
ax: 0000000000000000 bx: ffff8801f57837b8 cx: 0000000000000000
dx: 000000000091c029 si: 000000000091c029 di: 0000000000000001
bp: ffff880214469a58 sp: ffff880214469a08 r8: ffff8801f57837b8
r9: 0000000000000008 r10: 0000000000000000 r11: 0000000000000000
r12: ffff8802256e7870 r13: ffff8801893c52a0 r14: ffff8802256e76c0
r15: ffff8802256e7510 ip: ffffffff81267c4a flags: 00010202 cs: 00000010
ss: 00000018 ds: 00000018 es: 00000018 fs: 00000018 gs: 00000018

journal == NULL?

sage


>
> Honza
>
> > <4>[19307.623653] PGD 20bc32067 PUD 2245fc067 PMD 0
> > <4>[19307.623657] Oops: 0000 [#1] SMP
> > [dumpcommon]kdb> -bt
> >
> > Stack traceback for pid 8877
> > 0xffff88020db7bf20 8877 8795 1 4 R 0xffff88020db7c3a8 *ceph-osd
> > ffff880214469a08 0000000000000018 ffffffff81267ad0 ffffea000834d600
> > ffff880226c03300 ffffffff81256562 0000000000000282 ffff880214469a68
> > 0000000000000000 00000000000011a5 ffffffff81825270 ffff8802256e7510
> > Call Trace:
> > [<ffffffff81267ad0>] ? jbd2_journal_dirty_metadata+0x60/0x260
> > [<ffffffff81256562>] ? ext4_xattr_block_set+0xc2/0x910
> > [<ffffffff81245093>] ? __ext4_handle_dirty_metadata+0xa3/0x140
> > [<ffffffff8121aeee>] ? ext4_mark_iloc_dirty+0x40e/0x660
> > [<ffffffff81257835>] ? ext4_xattr_set_handle+0x265/0x4a0
> > [<ffffffff81257b32>] ? ext4_xattr_set+0xc2/0x140
> > [<ffffffff81258547>] ? ext4_xattr_user_set+0x47/0x50
> > [<ffffffff811935ce>] ? generic_setxattr+0x6e/0x90
> > [<ffffffff81193ecb>] ? __vfs_setxattr_noperm+0x7b/0x1c0
> > [<ffffffff811940d4>] ? vfs_setxattr+0xc4/0xd0
> > [<ffffffff8119421e>] ? setxattr+0x13e/0x1e0
> > [<ffffffff811719c7>] ? __sb_start_write+0xe7/0x1b0
> > [<ffffffff8118f2e8>] ? mnt_want_write_file+0x28/0x60
> > [<ffffffff8118c65c>] ? fget_light+0x3c/0x130
> > [<ffffffff8118f2e8>] ? mnt_want_write_file+0x28/0x60
> > [<ffffffff8118f1f8>] ? __mnt_want_write+0x58/0x70
> > [<ffffffff811946be>] ? SyS_fsetxattr+0xbe/0x100
> > [<ffffffff816407c2>] ? system_call_fastpath+0x16/0x1b
> >
> >
> > The workload is a ceph-osd daemon, which tends to hammer pretty heavily on
> > the xattr paths. I don't have a nice self-contained reproducer or
> > anything since this is is falling out of our integration tests.
> > Hopefully there is enough here (or that can be gleaned from kdb) that it
> > is clear what is going on. It's v3.10 (plus some unrelated patches).
> >
> > Thanks!
> > sage
> >
> >
> >
> > On Wed, 31 Jul 2013, Jan Kara wrote:
> > > Hello,
> > >
> > > On Tue 30-07-13 15:41:40, Sage Weil wrote:
> > > > Hit this on 3.10. Is this a known issue?
> > > No, I haven't seen it. Why did the kernel crash?
> > >
> > > Honza
> > > >
> > > > Thanks-
> > > > sage
> > > >
> > > >
> > > > Stack traceback for pid 23944
> > > > 0xffff8802206edeb0 23944 23840 1 2 R 0xffff8802206ee338
> > > > *ceph-osd
> > > > ffff88020bf17b78 0000000000000018 ffffffff81267398 ffff8802256d40c0
> > > > ffff88020b5a1230 ffff88020bf17bb8 ffffffff81325d9c ffff88020b5a1230
> > > > ffff8802256d40c0 ffff88020b5a1230 00000000698a8d24 ffff88020bf17be8
> > > > Call Trace:
> > > > [<ffffffff81267398>] ? __jbd2_journal_file_buffer+0x188/0x260
> > > > [<ffffffff81325d9c>] ? do_raw_spin_lock+0x10c/0x150
> > > > [<ffffffff81268118>] ? do_get_write_access+0x448/0x650
> > > > [<ffffffff81213caf>] ? ext4_read_inode_bitmap+0x9f/0x5f0
> > > > [<ffffffff81167849>] ? kmem_cache_alloc+0x39/0x160
> > > > [<ffffffff81268490>] ? jbd2_journal_get_write_access+0x30/0x50
> > > > [<ffffffff81244d43>] ? __ext4_journal_get_write_access+0x43/0x90
> > > > [<ffffffff812143d7>] ? ext4_free_inode+0x1d7/0x5d0
> > > > [<ffffffff8121e601>] ? ext4_evict_inode+0x341/0x4d0
> > > > [<ffffffff8121b268>] ? ext4_mark_inode_dirty+0x88/0x230
> > > > [<ffffffff8121e610>] ? ext4_evict_inode+0x350/0x4d0
> > > > [<ffffffff8118a098>] ? evict+0xb8/0x1c0
> > > > [<ffffffff8118a9d5>] ? iput+0x105/0x190
> > > > [<ffffffff8117d341>] ? do_unlinkat+0x201/0x270
> > > > [<ffffffff8131e9be>] ? trace_hardirqs_on_thunk+0x3a/0x3f
> > > > [<ffffffff81180356>] ? SyS_unlink+0x16/0x20
> > > >
> > > > --
> > > > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> > > > the body of a message to [email protected]
> > > > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > > --
> > > Jan Kara <[email protected]>
> > > SUSE Labs, CR
> > >
> > >
> --
> Jan Kara <[email protected]>
> SUSE Labs, CR
>
>

2013-08-12 12:53:00

by Jan Kara

[permalink] [raw]
Subject: Re: crash in __jbd2_journal_file_buffer

On Fri 09-08-13 15:11:05, Sage Weil wrote:
> On Fri, 9 Aug 2013, Jan Kara wrote:
> > On Fri 09-08-13 10:36:37, Sage Weil wrote:
> > > Hi Jan,
> > >
> > > Sorry for the slow response; took a while for this to happen again. This
> > > time I'm keeping the machine sitting in kdb in case there is more
> > > information needed.
> > >
> > > <4>[19307.314449] ------------[ cut here ]------------
> > > <4>[19307.319114] WARNING: at /srv/autobuild-ceph/gitbuilder.git/build/fs/jbd2/transaction.c:1237 jbd2_journal_dirty_metadata+0x1ba/0x260()
> > >
> > > <4>[19307.382324] CPU: 0 PID: 8877 Comm: ceph-osd Tainted: G W 3.10.0-ceph-00049-g68d04c9 #1
> > > <4>[19307.391256] Hardware name: Dell Inc. PowerEdge R410/01V648, BIOS 1.6.3 02/07/2011
> > > <4>[19307.398879] ffffffff81a1d3c8 ffff880214469928 ffffffff816311b0 ffff880214469968
> > > <4>[19307.407572] ffffffff8103fae0 ffff880214469958 ffff880170a9dc30 ffff8802240fbe80
> > > <4>[19307.415153] 0000000000000000 ffff88020b366000 ffff8802256e7510 ffff880214469978
> > > <4>[19307.422633] Call Trace:
> > > <4>[19307.425209] [<ffffffff816311b0>] dump_stack+0x19/0x1b
> > > <4>[19307.430368] [<ffffffff8103fae0>] warn_slowpath_common+0x70/0xa0
> > > <4>[19307.436502] [<ffffffff8103fb2a>] warn_slowpath_null+0x1a/0x20
> > > <4>[19307.442356] [<ffffffff81267c2a>] jbd2_journal_dirty_metadata+0x1ba/0x260
> > > <4>[19307.449271] [<ffffffff81245093>] __ext4_handle_dirty_metadata+0xa3/0x140
> > > <4>[19307.456192] [<ffffffff812561f3>] ext4_xattr_release_block+0x103/0x1f0
> > > <4>[19307.462742] [<ffffffff81256680>] ext4_xattr_block_set+0x1e0/0x910
> > > <4>[19307.469049] [<ffffffff8125795b>] ext4_xattr_set_handle+0x38b/0x4a0
> > > <4>[19307.475445] [<ffffffff810a319d>] ? trace_hardirqs_on+0xd/0x10
> > > <4>[19307.481300] [<ffffffff81257b32>] ext4_xattr_set+0xc2/0x140
> > > <4>[19307.486995] [<ffffffff81258547>] ext4_xattr_user_set+0x47/0x50
> > > <4>[19307.492936] [<ffffffff811935ce>] generic_setxattr+0x6e/0x90
> > > <4>[19307.498734] [<ffffffff81193ecb>] __vfs_setxattr_noperm+0x7b/0x1c0
> > > <4>[19307.505049] [<ffffffff811940d4>] vfs_setxattr+0xc4/0xd0
> > > <4>[19307.510380] [<ffffffff8119421e>] setxattr+0x13e/0x1e0
> > > <4>[19307.515646] [<ffffffff811719c7>] ? __sb_start_write+0xe7/0x1b0
> > > <4>[19307.521587] [<ffffffff8118f2e8>] ? mnt_want_write_file+0x28/0x60
> > > <4>[19307.527812] [<ffffffff8118c65c>] ? fget_light+0x3c/0x130
> > > <4>[19307.533230] [<ffffffff8118f2e8>] ? mnt_want_write_file+0x28/0x60
> > > <4>[19307.539522] [<ffffffff8118f1f8>] ? __mnt_want_write+0x58/0x70
> > > <4>[19307.545492] [<ffffffff811946be>] SyS_fsetxattr+0xbe/0x100
> > > <4>[19307.551001] [<ffffffff816407c2>] system_call_fastpath+0x16/0x1b
> > > <4>[19307.557137] ---[ end trace 3e447f9462172c58 ]---
> > Hum, weird. So we returned EUCLEAN from jbd2_journal_dirty_metadata()
> > meaning that buffer head passed in there didn't have journal head attached.
> > However ext4_xattr_release_block() does call
> > ext4_journal_get_write_access() for the bh before calling
> > ext4_handle_dirty_xattr_block(). So we should have bh attached to the
> > running transaction which also keeps journal head in place.
> >
> > > <2>[19307.561776] EXT4-fs error (device sda1) in ext4_handle_dirty_xattr_block:167: error 117
> > > <3>[19307.570181] Aborting journal on device sda1-8.
> > > <2>[19307.604589] EXT4-fs (sda1): Remounting filesystem read-only
> > > <2>[19307.610273] EXT4-fs error (device sda1) in ext4_xattr_release_block:558: error 117
> > > <0>[19307.623337] journal commit I/O error
> > > <0>[19307.623342] journal commit I/O error
> > > <0>[19307.623405] journal commit I/O error
> > > <0>[19307.623519] journal commit I/O error
> > > <2>[19307.623585] EXT4-fs error (device sda1): __ext4_journal_start_sb:62: Detected aborted journal
> > > <1>[19307.623642] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
> > > <1>[19307.623649] IP: [<ffffffff81267c4a>] jbd2_journal_dirty_metadata+0x1da/0x260
> > Can you possibly disassemble where exactly did we crash in this function?
> > Apparently we didn't properly handle the fact that the journal got aborted
> > here. But I'm not sure what exactly failed.
>
> Hrm, I'm having trouble getting from a bzImage back to something I can
> objdump with inline code, so this is a separate compile with the same
> .config that is hopefully right:
>
> jh->b_modified = 1;
> J_ASSERT_JH(jh, handle->h_buffer_credits > 0);
> 1e33: 0f 0b ud2
> */
> if (jh->b_transaction != transaction) {
> JBUFFER_TRACE(jh, "already on other transaction");
> if (unlikely(jh->b_transaction !=
> journal->j_committing_transaction)) {
> printk(KERN_EMERG "JBD: %s: "
> 1e35: 4d 85 c9 test %r9,%r9
> 1e38: 74 04 je 1e3e <jbd2_journal_dirty_metadata+0x1de>
> 1e3a: 41 8b 41 08 mov 0x8(%r9),%eax
> **here^^^
> 1e3e: 45 31 c0 xor %r8d,%r8d
> 1e41: 48 85 c9 test %rcx,%rcx
> 1e44: 74 04 je 1e4a <jbd2_journal_dirty_metadata+0x1ea>
> 1e46: 44 8b 41 08 mov 0x8(%rcx),%r8d
> 1e4a: 48 8b 53 18 mov 0x18(%rbx),%rdx
> "jh->b_transaction (%llu, %p, %u) != "
> "journal->j_committing_transaction (%p, %u)",
> journal->j_devname,
> 1e4e: 49 8d b6 80 05 00 00 lea 0x580(%r14),%rsi
> */
> if (jh->b_transaction != transaction) {
> JBUFFER_TRACE(jh, "already on other transaction");
> if (unlikely(jh->b_transaction !=
> journal->j_committing_transaction)) {
> printk(KERN_EMERG "JBD: %s: "
> 1e55: 89 04 24 mov %eax,(%rsp)
> 1e58: 48 c7 c7 00 00 00 00 mov $0x0,%rdi
> 1e5b: R_X86_64_32S .rodata.str1.8+0x160
> 1e5f: 31 c0 xor %eax,%eax
> 1e61: e8 00 00 00 00 callq 1e66 <jbd2_journal_dirty_metadata+0x206>
> 1e62: R_X86_64_PC32 printk-0x4
> jh->b_transaction,
> jh->b_transaction ? jh->b_transaction->t_tid : 0,
> journal->j_committing_transaction,
> journal->j_committing_transaction ?
> journal->j_committing_transaction->t_tid : 0);
> ret = -EINVAL;
> 1e66: b8 ea ff ff ff mov $0xffffffea,%eax
> 1e6b: e9 a4 fe ff ff jmpq 1d14 <jbd2_journal_dirty_metadata+0xb4>
> }
> if (unlikely(jh->b_next_transaction != transaction)) {
> printk(KERN_EMERG "JBD: %s: "
> 1e70: 45 31 c0 xor %r8d,%r8d
> 1e73: 48 85 c9 test %rcx,%rcx
> 1e76: 41 8b 44 24 08 mov 0x8(%r12),%eax
> 1e7b: 74 04 je 1e81 <jbd2_journal_dirty_metadata+0x221>
> 1e7d: 44 8b 41 08 mov 0x8(%rcx),%r8d
> 1e81: 48 8b 53 18 mov 0x18(%rbx),%rdx
> "jh->b_next_transaction (%llu, %p, %u) != "
> "transaction (%p, %u)",
> journal->j_devname,
> 1e85: 49 8d b6 80 05 00 00 lea 0x580(%r14),%rsi
> journal->j_committing_transaction ?
> journal->j_committing_transaction->t_tid : 0);
> ret = -EINVAL;
> }
> if (unlikely(jh->b_next_transaction != transaction)) {
> printk(KERN_EMERG "JBD: %s: "
> 1e8c: 89 04 24 mov %eax,(%rsp)
> 1e8f: 4d 89 e1 mov %r12,%r9
> 1e92: 48 c7 c7 00 00 00 00 mov $0x0,%rdi
> 1e95: R_X86_64_32S .rodata.str1.8+0x1c0
> 1e99: 31 c0 xor %eax,%eax
> 1e9b: e8 00 00 00 00 callq 1ea0 <jbd2_journal_dirty_metadata+0x240>
> 1e9c: R_X86_64_PC32 printk-0x4
> (unsigned long long) bh->b_blocknr,
> jh->b_next_transaction,
> jh->b_next_transaction ?
> jh->b_next_transaction->t_tid : 0,
> transaction, transaction->t_tid);
> ret = -EINVAL;
> 1ea0: b8 ea ff ff ff mov $0xffffffea,%eax
> 1ea5: e9 77 fe ff ff jmpq 1d21 <jbd2_journal_dirty_metadata+0xc1>
> * bit-based spin_unlock()
> */
> static inline void bit_spin_unlock(int bitnum, unsigned long *addr)
> {
> #ifdef CONFIG_DEBUG_SPINLOCK
> BUG_ON(!test_bit(bitnum, addr));
> 1eaa: e8 00 00 00 00 callq 1eaf <jbd2_journal_dirty_metadata+0x24f>
> 1eab: R_X86_64_PC32 .text.unlikely-0x4
> * transaction's data buffer, ever. */
>
> Registers are
>
> [4]kdb> rd
> ax: 0000000000000000 bx: ffff8801f57837b8 cx: 0000000000000000
> dx: 000000000091c029 si: 000000000091c029 di: 0000000000000001
> bp: ffff880214469a58 sp: ffff880214469a08 r8: ffff8801f57837b8
> r9: 0000000000000008 r10: 0000000000000000 r11: 0000000000000000
> r12: ffff8802256e7870 r13: ffff8801893c52a0 r14: ffff8802256e76c0
> r15: ffff8802256e7510 ip: ffffffff81267c4a flags: 00010202 cs: 00000010
> ss: 00000018 ds: 00000018 es: 00000018 fs: 00000018 gs: 00000018
>
> journal == NULL?
Umm, r9 is 0x8 so it seems jh->b_transaction or
journal->j_running_transaction was 8 (instead of NULL vs valid pointer) and
thus we oopsed while looking at ->t_tid... Oh, I see where's the problem.
__ext4_handle_dirty_metadata() will stop the handle if there's an error but
caller really doesn't count with this and further uses freed handle. So at
least the BUG part of the problem is clear now - I'll look into fixing
that.

But how did we pass bh without jh into jbd2_journal_dirty_metadata() still
isn't clear to me. BTW, I see 'W' taint flag in the warning which means
there was some warning before this one. What was that?

Honza

--
Jan Kara <[email protected]>
SUSE Labs, CR

2013-08-13 10:34:19

by Jan Kara

[permalink] [raw]
Subject: Re: crash in __jbd2_journal_file_buffer

On Mon 12-08-13 11:13:06, Sage Weil wrote:
> Full dmesg is attached.
Hum, nothing interesting in there...

> Our QA seems to hit this with some regularity. Let me know if there's
> some combination of patches that would help shed more light!
If they can run with attached debug patch it could maybe sched some more
light. Please send also your System.map file together with the dmesg of the
kernel when the crash happens so that I can map addresses to function
names... Thanks!

Honza

--
Jan Kara <[email protected]>
SUSE Labs, CR


Attachments:
(No filename) (542.00 B)
0001-jbd2-Debug-journal_head-additions-and-removals.patch (2.09 kB)
Download all attachments

2013-08-22 23:35:16

by Sage Weil

[permalink] [raw]
Subject: Re: crash in __jbd2_journal_file_buffer

On Tue, 13 Aug 2013, Jan Kara wrote:
> On Mon 12-08-13 11:13:06, Sage Weil wrote:
> > Full dmesg is attached.
> Hum, nothing interesting in there...
>
> > Our QA seems to hit this with some regularity. Let me know if there's
> > some combination of patches that would help shed more light!
> If they can run with attached debug patch it could maybe sched some more
> light. Please send also your System.map file together with the dmesg of the
> kernel when the crash happens so that I can map addresses to function
> names... Thanks!

Okay, finally hit it again:

<6>[75193.192249] EXT4-fs (sda1): re-mounted. Opts: errors=remount-ro,user_xattr,user_xattr
<3>[77877.426658] Dirtying buffer without jh at 4302720297: state 218c029,jh added from 0xffffffff8127ab1d at 4302720297, removed from 0xffffffff8127b5b0 at 4302720296
<4>[77877.441200] ------------[ cut here ]------------
<4>[77877.445845] WARNING: CPU: 7 PID: 26045 at /srv/autobuild-ceph/gitbuilder.git/build/fs/jbd2/transaction.c:1380 jbd2_journal_dirty_metadata+0x1f1/0x2e0()

<4>[77877.497349] CPU: 7 PID: 26045 Comm: ceph-osd Not tainted 3.11.0-rc5-ceph-00061-g546140d #1
<4>[77877.505649] Hardware name: Dell Inc. PowerEdge R410/01V648, BIOS 1.6.3 02/07/2011
<4>[77877.513213] 0000000000000564 ffff880131ca1938 ffffffff81642d85 ffff8802272ef290
<4>[77877.520694] 0000000000000000 ffff880131ca1978 ffffffff8104985c ffff880131ca19a0
<4>[77877.528218] ffff88020f695aa0 0000000000000000 ffff880214c48b40 ffff88020be55000
<4>[77877.535756] Call Trace:
<4>[77877.538279] [<ffffffff81642d85>] dump_stack+0x46/0x58
<4>[77877.543439] [<ffffffff8104985c>] warn_slowpath_common+0x8c/0xc0
<4>[77877.549548] [<ffffffff810498aa>] warn_slowpath_null+0x1a/0x20
<4>[77877.555413] [<ffffffff8127adb1>] jbd2_journal_dirty_metadata+0x1f1/0x2e0
<4>[77877.562288] [<ffffffff812578c3>] __ext4_handle_dirty_metadata+0xa3/0x140
<4>[77877.569155] [<ffffffff81268e23>] ext4_xattr_release_block+0x103/0x1f0
<4>[77877.575723] [<ffffffff812692b0>] ext4_xattr_block_set+0x1e0/0x910
<4>[77877.581990] [<ffffffff8126a58b>] ext4_xattr_set_handle+0x38b/0x4a0
<4>[77877.588335] [<ffffffff810af7cd>] ? trace_hardirqs_on+0xd/0x10
<4>[77877.594188] [<ffffffff8126a765>] ext4_xattr_set+0xc5/0x140
<4>[77877.599837] [<ffffffff8126b177>] ext4_xattr_user_set+0x47/0x50
<4>[77877.605779] [<ffffffff811a3fee>] generic_setxattr+0x6e/0x90
<4>[77877.611514] [<ffffffff811a48eb>] __vfs_setxattr_noperm+0x7b/0x1c0
<4>[77877.617773] [<ffffffff811a4af4>] vfs_setxattr+0xc4/0xd0
<4>[77877.623103] [<ffffffff811a4c3e>] setxattr+0x13e/0x1e0
<4>[77877.628317] [<ffffffff81181ec7>] ? __sb_start_write+0xe7/0x1b0
<4>[77877.634260] [<ffffffff8119fb98>] ? mnt_want_write_file+0x28/0x60
<4>[77877.640428] [<ffffffff8119cf0c>] ? fget_light+0x3c/0x130
<4>[77877.645847] [<ffffffff8119fb98>] ? mnt_want_write_file+0x28/0x60
<4>[77877.652015] [<ffffffff8119e902>] ? mnt_clone_write+0x12/0x30
<4>[77877.657897] [<ffffffff811a50de>] SyS_fsetxattr+0xbe/0x100
<4>[77877.663405] [<ffffffff81653782>] system_call_fastpath+0x16/0x1b
<4>[77877.669488] ---[ end trace bb7933908cd5a32a ]---
<2>[77877.674126] EXT4-fs error (device sda1) in ext4_handle_dirty_xattr_block:167: error 117
<3>[77877.692983] Aborting journal on device sda1-8.
<2>[77877.721561] EXT4-fs (sda1): Remounting filesystem read-only
<0>[77877.721657] journal commit I/O error
<0>[77877.721706] journal commit I/O error
<0>[77877.721707] journal commit I/O error
<2>[77877.727300] EXT4-fs error (device sda1): ext4_journal_check_start:56: Detected aborted journal
<2>[77877.727338] EXT4-fs (sda1): Remounting filesystem read-only
<2>[77877.727613] EXT4-fs error (device sda1): ext4_journal_check_start:56: Detected aborted journal
<2>[77877.727618] EXT4-fs error (device sda1): ext4_journal_check_start:56: Detected aborted journal
<2>[77877.727625] EXT4-fs error (device sda1): ext4_journal_check_start:56: Detected aborted journal
<2>[77877.778239] EXT4-fs error (device sda1) in ext4_xattr_release_block:558: error 117
<3>[77877.786051] Dirtying buffer without jh at 4302720332: state 10c029,jh added from 0xffffffff8127eb88 at 4302720326, removed from 0xffffffff8127b5b0 at 4302720274
<4>[77877.800516] ------------[ cut here ]------------
<4>[77877.805156] WARNING: CPU: 7 PID: 26045 at /srv/autobuild-ceph/gitbuilder.git/build/fs/jbd2/transaction.c:1380 jbd2_journal_dirty_metadata+0x1f1/0x2e0()

<4>[77877.856583] CPU: 7 PID: 26045 Comm: ceph-osd Tainted: G W 3.11.0-rc5-ceph-00061-g546140d #1
<4>[77877.865896] Hardware name: Dell Inc. PowerEdge R410/01V648, BIOS 1.6.3 02/07/2011
<4>[77877.873475] 0000000000000564 ffff880131ca19b8 ffffffff81642d85 ffff8802272ef290
<4>[77877.880954] 0000000000000000 ffff880131ca19f8 ffffffff8104985c ffff880131ca1a20
<4>[77877.888488] ffff8801c499be58 0000000000000000 ffff880029010000 ffff880029010c30
<4>[77877.895962] Call Trace:
<4>[77877.898484] [<ffffffff81642d85>] dump_stack+0x46/0x58
<4>[77877.903643] [<ffffffff8104985c>] warn_slowpath_common+0x8c/0xc0
<4>[77877.909724] [<ffffffff810498aa>] warn_slowpath_null+0x1a/0x20
<4>[77877.915577] [<ffffffff8127adb1>] jbd2_journal_dirty_metadata+0x1f1/0x2e0
<4>[77877.922443] [<ffffffff810af7cd>] ? trace_hardirqs_on+0xd/0x10
<4>[77877.928353] [<ffffffff812578c3>] __ext4_handle_dirty_metadata+0xa3/0x140
<4>[77877.935165] [<ffffffff8122ca3e>] ext4_mark_iloc_dirty+0x40e/0x660
<4>[77877.941421] [<ffffffff8126a465>] ext4_xattr_set_handle+0x265/0x4a0
<4>[77877.947766] [<ffffffff8126a765>] ext4_xattr_set+0xc5/0x140
<4>[77877.953358] [<ffffffff8126b177>] ext4_xattr_user_set+0x47/0x50
<4>[77877.959354] [<ffffffff811a3fee>] generic_setxattr+0x6e/0x90
<4>[77877.965034] [<ffffffff811a48eb>] __vfs_setxattr_noperm+0x7b/0x1c0
<4>[77877.971289] [<ffffffff811a4af4>] vfs_setxattr+0xc4/0xd0
<4>[77877.976621] [<ffffffff811a4c3e>] setxattr+0x13e/0x1e0
<4>[77877.981837] [<ffffffff81181ec7>] ? __sb_start_write+0xe7/0x1b0
<4>[77877.987830] [<ffffffff8119fb98>] ? mnt_want_write_file+0x28/0x60
<4>[77877.993944] [<ffffffff8119cf0c>] ? fget_light+0x3c/0x130
<4>[77877.999417] [<ffffffff8119fb98>] ? mnt_want_write_file+0x28/0x60
<4>[77878.005530] [<ffffffff8119e902>] ? mnt_clone_write+0x12/0x30
<4>[77878.011349] [<ffffffff811a50de>] SyS_fsetxattr+0xbe/0x100
<4>[77878.016856] [<ffffffff81653782>] system_call_fastpath+0x16/0x1b
<4>[77878.022941] ---[ end trace bb7933908cd5a32b ]---
[7]kdb> rd
ax: ffff88020aadbf20 bx: ffff8800290100a0 cx: 0000000000000000
dx: ffff8800290100a0 si: ffff8800290100a0 di: ffff8800290100a0
bp: ffff880131ca1a28 sp: ffff880131ca1978 r8: 0000000000000002
r9: 0000000000000000 r10: 0000000000000001 r11: 0000000000000000
r12: ffff880029010000 r13: 0000000000000000 r14: ffff880029010c30
r15: 00000000ffffff8b ip: ffffffff81279f84 flags: 00010286 cs: 00000010
ss: 00000018 ds: 00000018 es: 00000018 fs: 00000018 gs: 00000018
[7]kdb> bt
Stack traceback for pid 26045
0xffff88020aadbf20 26045 25958 1 7 R 0xffff88020aadc3a8 *ceph-osd
ffff880131ca1978 0000000000000018 ffff880131ca1978 ffff880131ca1998
bb7933908cd5a32b 0000000000000000 0000000000000001 0000000000000002
0000000000000000 ffff880131ca19b8 0000000000000000 ffff880131ca19f8
Call Trace:
[<ffffffff8104986f>] ? warn_slowpath_common+0x9f/0xc0
[<ffffffff81257308>] ? __ext4_journal_stop+0x78/0xa0
[<ffffffff812578dc>] ? __ext4_handle_dirty_metadata+0xbc/0x140
[<ffffffff8122ca3e>] ? ext4_mark_iloc_dirty+0x40e/0x660
[<ffffffff8126a465>] ? ext4_xattr_set_handle+0x265/0x4a0
[<ffffffff8126a765>] ? ext4_xattr_set+0xc5/0x140
[<ffffffff8126b177>] ? ext4_xattr_user_set+0x47/0x50
[<ffffffff811a3fee>] ? generic_setxattr+0x6e/0x90
[<ffffffff811a48eb>] ? __vfs_setxattr_noperm+0x7b/0x1c0
[<ffffffff811a4af4>] ? vfs_setxattr+0xc4/0xd0
[<ffffffff811a4c3e>] ? setxattr+0x13e/0x1e0
[<ffffffff81181ec7>] ? __sb_start_write+0xe7/0x1b0
[<ffffffff8119fb98>] ? mnt_want_write_file+0x28/0x60
[<ffffffff8119cf0c>] ? fget_light+0x3c/0x130
[<ffffffff8119fb98>] ? mnt_want_write_file+0x28/0x60
[<ffffffff8119e902>] ? mnt_clone_write+0x12/0x30
[<ffffffff811a50de>] ? SyS_fsetxattr+0xbe/0x100
[<ffffffff81653782>] ? system_call_fastpath+0x16/0x1b

Let me know if there is anything else I can gather from this machine that
will help!

sage

2013-08-23 13:14:11

by Jan Kara

[permalink] [raw]
Subject: Re: crash in __jbd2_journal_file_buffer

On Thu 22-08-13 16:35:15, Sage Weil wrote:
> On Tue, 13 Aug 2013, Jan Kara wrote:
> > On Mon 12-08-13 11:13:06, Sage Weil wrote:
> > > Full dmesg is attached.
> > Hum, nothing interesting in there...
> >
> > > Our QA seems to hit this with some regularity. Let me know if there's
> > > some combination of patches that would help shed more light!
> > If they can run with attached debug patch it could maybe sched some more
> > light. Please send also your System.map file together with the dmesg of the
> > kernel when the crash happens so that I can map addresses to function
> > names... Thanks!
>
> Okay, finally hit it again:
>
> <6>[75193.192249] EXT4-fs (sda1): re-mounted. Opts: errors=remount-ro,user_xattr,user_xattr
> <3>[77877.426658] Dirtying buffer without jh at 4302720297: state 218c029,jh added from 0xffffffff8127ab1d at 4302720297, removed from 0xffffffff8127b5b0 at 4302720296
Great! Can you please send me /proc/kallsyms from the machine?

Honza

> <4>[77877.441200] ------------[ cut here ]------------
> <4>[77877.445845] WARNING: CPU: 7 PID: 26045 at /srv/autobuild-ceph/gitbuilder.git/build/fs/jbd2/transaction.c:1380 jbd2_journal_dirty_metadata+0x1f1/0x2e0()
>
> <4>[77877.497349] CPU: 7 PID: 26045 Comm: ceph-osd Not tainted 3.11.0-rc5-ceph-00061-g546140d #1
> <4>[77877.505649] Hardware name: Dell Inc. PowerEdge R410/01V648, BIOS 1.6.3 02/07/2011
> <4>[77877.513213] 0000000000000564 ffff880131ca1938 ffffffff81642d85 ffff8802272ef290
> <4>[77877.520694] 0000000000000000 ffff880131ca1978 ffffffff8104985c ffff880131ca19a0
> <4>[77877.528218] ffff88020f695aa0 0000000000000000 ffff880214c48b40 ffff88020be55000
> <4>[77877.535756] Call Trace:
> <4>[77877.538279] [<ffffffff81642d85>] dump_stack+0x46/0x58
> <4>[77877.543439] [<ffffffff8104985c>] warn_slowpath_common+0x8c/0xc0
> <4>[77877.549548] [<ffffffff810498aa>] warn_slowpath_null+0x1a/0x20
> <4>[77877.555413] [<ffffffff8127adb1>] jbd2_journal_dirty_metadata+0x1f1/0x2e0
> <4>[77877.562288] [<ffffffff812578c3>] __ext4_handle_dirty_metadata+0xa3/0x140
> <4>[77877.569155] [<ffffffff81268e23>] ext4_xattr_release_block+0x103/0x1f0
> <4>[77877.575723] [<ffffffff812692b0>] ext4_xattr_block_set+0x1e0/0x910
> <4>[77877.581990] [<ffffffff8126a58b>] ext4_xattr_set_handle+0x38b/0x4a0
> <4>[77877.588335] [<ffffffff810af7cd>] ? trace_hardirqs_on+0xd/0x10
> <4>[77877.594188] [<ffffffff8126a765>] ext4_xattr_set+0xc5/0x140
> <4>[77877.599837] [<ffffffff8126b177>] ext4_xattr_user_set+0x47/0x50
> <4>[77877.605779] [<ffffffff811a3fee>] generic_setxattr+0x6e/0x90
> <4>[77877.611514] [<ffffffff811a48eb>] __vfs_setxattr_noperm+0x7b/0x1c0
> <4>[77877.617773] [<ffffffff811a4af4>] vfs_setxattr+0xc4/0xd0
> <4>[77877.623103] [<ffffffff811a4c3e>] setxattr+0x13e/0x1e0
> <4>[77877.628317] [<ffffffff81181ec7>] ? __sb_start_write+0xe7/0x1b0
> <4>[77877.634260] [<ffffffff8119fb98>] ? mnt_want_write_file+0x28/0x60
> <4>[77877.640428] [<ffffffff8119cf0c>] ? fget_light+0x3c/0x130
> <4>[77877.645847] [<ffffffff8119fb98>] ? mnt_want_write_file+0x28/0x60
> <4>[77877.652015] [<ffffffff8119e902>] ? mnt_clone_write+0x12/0x30
> <4>[77877.657897] [<ffffffff811a50de>] SyS_fsetxattr+0xbe/0x100
> <4>[77877.663405] [<ffffffff81653782>] system_call_fastpath+0x16/0x1b
> <4>[77877.669488] ---[ end trace bb7933908cd5a32a ]---
> <2>[77877.674126] EXT4-fs error (device sda1) in ext4_handle_dirty_xattr_block:167: error 117
> <3>[77877.692983] Aborting journal on device sda1-8.
> <2>[77877.721561] EXT4-fs (sda1): Remounting filesystem read-only
> <0>[77877.721657] journal commit I/O error
> <0>[77877.721706] journal commit I/O error
> <0>[77877.721707] journal commit I/O error
> <2>[77877.727300] EXT4-fs error (device sda1): ext4_journal_check_start:56: Detected aborted journal
> <2>[77877.727338] EXT4-fs (sda1): Remounting filesystem read-only
> <2>[77877.727613] EXT4-fs error (device sda1): ext4_journal_check_start:56: Detected aborted journal
> <2>[77877.727618] EXT4-fs error (device sda1): ext4_journal_check_start:56: Detected aborted journal
> <2>[77877.727625] EXT4-fs error (device sda1): ext4_journal_check_start:56: Detected aborted journal
> <2>[77877.778239] EXT4-fs error (device sda1) in ext4_xattr_release_block:558: error 117
> <3>[77877.786051] Dirtying buffer without jh at 4302720332: state 10c029,jh added from 0xffffffff8127eb88 at 4302720326, removed from 0xffffffff8127b5b0 at 4302720274
> <4>[77877.800516] ------------[ cut here ]------------
> <4>[77877.805156] WARNING: CPU: 7 PID: 26045 at /srv/autobuild-ceph/gitbuilder.git/build/fs/jbd2/transaction.c:1380 jbd2_journal_dirty_metadata+0x1f1/0x2e0()
>
> <4>[77877.856583] CPU: 7 PID: 26045 Comm: ceph-osd Tainted: G W 3.11.0-rc5-ceph-00061-g546140d #1
> <4>[77877.865896] Hardware name: Dell Inc. PowerEdge R410/01V648, BIOS 1.6.3 02/07/2011
> <4>[77877.873475] 0000000000000564 ffff880131ca19b8 ffffffff81642d85 ffff8802272ef290
> <4>[77877.880954] 0000000000000000 ffff880131ca19f8 ffffffff8104985c ffff880131ca1a20
> <4>[77877.888488] ffff8801c499be58 0000000000000000 ffff880029010000 ffff880029010c30
> <4>[77877.895962] Call Trace:
> <4>[77877.898484] [<ffffffff81642d85>] dump_stack+0x46/0x58
> <4>[77877.903643] [<ffffffff8104985c>] warn_slowpath_common+0x8c/0xc0
> <4>[77877.909724] [<ffffffff810498aa>] warn_slowpath_null+0x1a/0x20
> <4>[77877.915577] [<ffffffff8127adb1>] jbd2_journal_dirty_metadata+0x1f1/0x2e0
> <4>[77877.922443] [<ffffffff810af7cd>] ? trace_hardirqs_on+0xd/0x10
> <4>[77877.928353] [<ffffffff812578c3>] __ext4_handle_dirty_metadata+0xa3/0x140
> <4>[77877.935165] [<ffffffff8122ca3e>] ext4_mark_iloc_dirty+0x40e/0x660
> <4>[77877.941421] [<ffffffff8126a465>] ext4_xattr_set_handle+0x265/0x4a0
> <4>[77877.947766] [<ffffffff8126a765>] ext4_xattr_set+0xc5/0x140
> <4>[77877.953358] [<ffffffff8126b177>] ext4_xattr_user_set+0x47/0x50
> <4>[77877.959354] [<ffffffff811a3fee>] generic_setxattr+0x6e/0x90
> <4>[77877.965034] [<ffffffff811a48eb>] __vfs_setxattr_noperm+0x7b/0x1c0
> <4>[77877.971289] [<ffffffff811a4af4>] vfs_setxattr+0xc4/0xd0
> <4>[77877.976621] [<ffffffff811a4c3e>] setxattr+0x13e/0x1e0
> <4>[77877.981837] [<ffffffff81181ec7>] ? __sb_start_write+0xe7/0x1b0
> <4>[77877.987830] [<ffffffff8119fb98>] ? mnt_want_write_file+0x28/0x60
> <4>[77877.993944] [<ffffffff8119cf0c>] ? fget_light+0x3c/0x130
> <4>[77877.999417] [<ffffffff8119fb98>] ? mnt_want_write_file+0x28/0x60
> <4>[77878.005530] [<ffffffff8119e902>] ? mnt_clone_write+0x12/0x30
> <4>[77878.011349] [<ffffffff811a50de>] SyS_fsetxattr+0xbe/0x100
> <4>[77878.016856] [<ffffffff81653782>] system_call_fastpath+0x16/0x1b
> <4>[77878.022941] ---[ end trace bb7933908cd5a32b ]---
> [7]kdb> rd
> ax: ffff88020aadbf20 bx: ffff8800290100a0 cx: 0000000000000000
> dx: ffff8800290100a0 si: ffff8800290100a0 di: ffff8800290100a0
> bp: ffff880131ca1a28 sp: ffff880131ca1978 r8: 0000000000000002
> r9: 0000000000000000 r10: 0000000000000001 r11: 0000000000000000
> r12: ffff880029010000 r13: 0000000000000000 r14: ffff880029010c30
> r15: 00000000ffffff8b ip: ffffffff81279f84 flags: 00010286 cs: 00000010
> ss: 00000018 ds: 00000018 es: 00000018 fs: 00000018 gs: 00000018
> [7]kdb> bt
> Stack traceback for pid 26045
> 0xffff88020aadbf20 26045 25958 1 7 R 0xffff88020aadc3a8 *ceph-osd
> ffff880131ca1978 0000000000000018 ffff880131ca1978 ffff880131ca1998
> bb7933908cd5a32b 0000000000000000 0000000000000001 0000000000000002
> 0000000000000000 ffff880131ca19b8 0000000000000000 ffff880131ca19f8
> Call Trace:
> [<ffffffff8104986f>] ? warn_slowpath_common+0x9f/0xc0
> [<ffffffff81257308>] ? __ext4_journal_stop+0x78/0xa0
> [<ffffffff812578dc>] ? __ext4_handle_dirty_metadata+0xbc/0x140
> [<ffffffff8122ca3e>] ? ext4_mark_iloc_dirty+0x40e/0x660
> [<ffffffff8126a465>] ? ext4_xattr_set_handle+0x265/0x4a0
> [<ffffffff8126a765>] ? ext4_xattr_set+0xc5/0x140
> [<ffffffff8126b177>] ? ext4_xattr_user_set+0x47/0x50
> [<ffffffff811a3fee>] ? generic_setxattr+0x6e/0x90
> [<ffffffff811a48eb>] ? __vfs_setxattr_noperm+0x7b/0x1c0
> [<ffffffff811a4af4>] ? vfs_setxattr+0xc4/0xd0
> [<ffffffff811a4c3e>] ? setxattr+0x13e/0x1e0
> [<ffffffff81181ec7>] ? __sb_start_write+0xe7/0x1b0
> [<ffffffff8119fb98>] ? mnt_want_write_file+0x28/0x60
> [<ffffffff8119cf0c>] ? fget_light+0x3c/0x130
> [<ffffffff8119fb98>] ? mnt_want_write_file+0x28/0x60
> [<ffffffff8119e902>] ? mnt_clone_write+0x12/0x30
> [<ffffffff811a50de>] ? SyS_fsetxattr+0xbe/0x100
> [<ffffffff81653782>] ? system_call_fastpath+0x16/0x1b
>
> Let me know if there is anything else I can gather from this machine that
> will help!
>
> sage
--
Jan Kara <[email protected]>
SUSE Labs, CR

2013-08-23 15:02:45

by Sage Weil

[permalink] [raw]
Subject: Re: crash in __jbd2_journal_file_buffer

On Fri, 23 Aug 2013, Jan Kara wrote:
> On Thu 22-08-13 16:35:15, Sage Weil wrote:
> > On Tue, 13 Aug 2013, Jan Kara wrote:
> > > On Mon 12-08-13 11:13:06, Sage Weil wrote:
> > > > Full dmesg is attached.
> > > Hum, nothing interesting in there...
> > >
> > > > Our QA seems to hit this with some regularity. Let me know if there's
> > > > some combination of patches that would help shed more light!
> > > If they can run with attached debug patch it could maybe sched some more
> > > light. Please send also your System.map file together with the dmesg of the
> > > kernel when the crash happens so that I can map addresses to function
> > > names... Thanks!
> >
> > Okay, finally hit it again:
> >
> > <6>[75193.192249] EXT4-fs (sda1): re-mounted. Opts: errors=remount-ro,user_xattr,user_xattr
> > <3>[77877.426658] Dirtying buffer without jh at 4302720297: state 218c029,jh added from 0xffffffff8127ab1d at 4302720297, removed from 0xffffffff8127b5b0 at 4302720296
> Great! Can you please send me /proc/kallsyms from the machine?

Attached!

sage

>
> Honza
>
> > <4>[77877.441200] ------------[ cut here ]------------
> > <4>[77877.445845] WARNING: CPU: 7 PID: 26045 at /srv/autobuild-ceph/gitbuilder.git/build/fs/jbd2/transaction.c:1380 jbd2_journal_dirty_metadata+0x1f1/0x2e0()
> >
> > <4>[77877.497349] CPU: 7 PID: 26045 Comm: ceph-osd Not tainted 3.11.0-rc5-ceph-00061-g546140d #1
> > <4>[77877.505649] Hardware name: Dell Inc. PowerEdge R410/01V648, BIOS 1.6.3 02/07/2011
> > <4>[77877.513213] 0000000000000564 ffff880131ca1938 ffffffff81642d85 ffff8802272ef290
> > <4>[77877.520694] 0000000000000000 ffff880131ca1978 ffffffff8104985c ffff880131ca19a0
> > <4>[77877.528218] ffff88020f695aa0 0000000000000000 ffff880214c48b40 ffff88020be55000
> > <4>[77877.535756] Call Trace:
> > <4>[77877.538279] [<ffffffff81642d85>] dump_stack+0x46/0x58
> > <4>[77877.543439] [<ffffffff8104985c>] warn_slowpath_common+0x8c/0xc0
> > <4>[77877.549548] [<ffffffff810498aa>] warn_slowpath_null+0x1a/0x20
> > <4>[77877.555413] [<ffffffff8127adb1>] jbd2_journal_dirty_metadata+0x1f1/0x2e0
> > <4>[77877.562288] [<ffffffff812578c3>] __ext4_handle_dirty_metadata+0xa3/0x140
> > <4>[77877.569155] [<ffffffff81268e23>] ext4_xattr_release_block+0x103/0x1f0
> > <4>[77877.575723] [<ffffffff812692b0>] ext4_xattr_block_set+0x1e0/0x910
> > <4>[77877.581990] [<ffffffff8126a58b>] ext4_xattr_set_handle+0x38b/0x4a0
> > <4>[77877.588335] [<ffffffff810af7cd>] ? trace_hardirqs_on+0xd/0x10
> > <4>[77877.594188] [<ffffffff8126a765>] ext4_xattr_set+0xc5/0x140
> > <4>[77877.599837] [<ffffffff8126b177>] ext4_xattr_user_set+0x47/0x50
> > <4>[77877.605779] [<ffffffff811a3fee>] generic_setxattr+0x6e/0x90
> > <4>[77877.611514] [<ffffffff811a48eb>] __vfs_setxattr_noperm+0x7b/0x1c0
> > <4>[77877.617773] [<ffffffff811a4af4>] vfs_setxattr+0xc4/0xd0
> > <4>[77877.623103] [<ffffffff811a4c3e>] setxattr+0x13e/0x1e0
> > <4>[77877.628317] [<ffffffff81181ec7>] ? __sb_start_write+0xe7/0x1b0
> > <4>[77877.634260] [<ffffffff8119fb98>] ? mnt_want_write_file+0x28/0x60
> > <4>[77877.640428] [<ffffffff8119cf0c>] ? fget_light+0x3c/0x130
> > <4>[77877.645847] [<ffffffff8119fb98>] ? mnt_want_write_file+0x28/0x60
> > <4>[77877.652015] [<ffffffff8119e902>] ? mnt_clone_write+0x12/0x30
> > <4>[77877.657897] [<ffffffff811a50de>] SyS_fsetxattr+0xbe/0x100
> > <4>[77877.663405] [<ffffffff81653782>] system_call_fastpath+0x16/0x1b
> > <4>[77877.669488] ---[ end trace bb7933908cd5a32a ]---
> > <2>[77877.674126] EXT4-fs error (device sda1) in ext4_handle_dirty_xattr_block:167: error 117
> > <3>[77877.692983] Aborting journal on device sda1-8.
> > <2>[77877.721561] EXT4-fs (sda1): Remounting filesystem read-only
> > <0>[77877.721657] journal commit I/O error
> > <0>[77877.721706] journal commit I/O error
> > <0>[77877.721707] journal commit I/O error
> > <2>[77877.727300] EXT4-fs error (device sda1): ext4_journal_check_start:56: Detected aborted journal
> > <2>[77877.727338] EXT4-fs (sda1): Remounting filesystem read-only
> > <2>[77877.727613] EXT4-fs error (device sda1): ext4_journal_check_start:56: Detected aborted journal
> > <2>[77877.727618] EXT4-fs error (device sda1): ext4_journal_check_start:56: Detected aborted journal
> > <2>[77877.727625] EXT4-fs error (device sda1): ext4_journal_check_start:56: Detected aborted journal
> > <2>[77877.778239] EXT4-fs error (device sda1) in ext4_xattr_release_block:558: error 117
> > <3>[77877.786051] Dirtying buffer without jh at 4302720332: state 10c029,jh added from 0xffffffff8127eb88 at 4302720326, removed from 0xffffffff8127b5b0 at 4302720274
> > <4>[77877.800516] ------------[ cut here ]------------
> > <4>[77877.805156] WARNING: CPU: 7 PID: 26045 at /srv/autobuild-ceph/gitbuilder.git/build/fs/jbd2/transaction.c:1380 jbd2_journal_dirty_metadata+0x1f1/0x2e0()
> >
> > <4>[77877.856583] CPU: 7 PID: 26045 Comm: ceph-osd Tainted: G W 3.11.0-rc5-ceph-00061-g546140d #1
> > <4>[77877.865896] Hardware name: Dell Inc. PowerEdge R410/01V648, BIOS 1.6.3 02/07/2011
> > <4>[77877.873475] 0000000000000564 ffff880131ca19b8 ffffffff81642d85 ffff8802272ef290
> > <4>[77877.880954] 0000000000000000 ffff880131ca19f8 ffffffff8104985c ffff880131ca1a20
> > <4>[77877.888488] ffff8801c499be58 0000000000000000 ffff880029010000 ffff880029010c30
> > <4>[77877.895962] Call Trace:
> > <4>[77877.898484] [<ffffffff81642d85>] dump_stack+0x46/0x58
> > <4>[77877.903643] [<ffffffff8104985c>] warn_slowpath_common+0x8c/0xc0
> > <4>[77877.909724] [<ffffffff810498aa>] warn_slowpath_null+0x1a/0x20
> > <4>[77877.915577] [<ffffffff8127adb1>] jbd2_journal_dirty_metadata+0x1f1/0x2e0
> > <4>[77877.922443] [<ffffffff810af7cd>] ? trace_hardirqs_on+0xd/0x10
> > <4>[77877.928353] [<ffffffff812578c3>] __ext4_handle_dirty_metadata+0xa3/0x140
> > <4>[77877.935165] [<ffffffff8122ca3e>] ext4_mark_iloc_dirty+0x40e/0x660
> > <4>[77877.941421] [<ffffffff8126a465>] ext4_xattr_set_handle+0x265/0x4a0
> > <4>[77877.947766] [<ffffffff8126a765>] ext4_xattr_set+0xc5/0x140
> > <4>[77877.953358] [<ffffffff8126b177>] ext4_xattr_user_set+0x47/0x50
> > <4>[77877.959354] [<ffffffff811a3fee>] generic_setxattr+0x6e/0x90
> > <4>[77877.965034] [<ffffffff811a48eb>] __vfs_setxattr_noperm+0x7b/0x1c0
> > <4>[77877.971289] [<ffffffff811a4af4>] vfs_setxattr+0xc4/0xd0
> > <4>[77877.976621] [<ffffffff811a4c3e>] setxattr+0x13e/0x1e0
> > <4>[77877.981837] [<ffffffff81181ec7>] ? __sb_start_write+0xe7/0x1b0
> > <4>[77877.987830] [<ffffffff8119fb98>] ? mnt_want_write_file+0x28/0x60
> > <4>[77877.993944] [<ffffffff8119cf0c>] ? fget_light+0x3c/0x130
> > <4>[77877.999417] [<ffffffff8119fb98>] ? mnt_want_write_file+0x28/0x60
> > <4>[77878.005530] [<ffffffff8119e902>] ? mnt_clone_write+0x12/0x30
> > <4>[77878.011349] [<ffffffff811a50de>] SyS_fsetxattr+0xbe/0x100
> > <4>[77878.016856] [<ffffffff81653782>] system_call_fastpath+0x16/0x1b
> > <4>[77878.022941] ---[ end trace bb7933908cd5a32b ]---
> > [7]kdb> rd
> > ax: ffff88020aadbf20 bx: ffff8800290100a0 cx: 0000000000000000
> > dx: ffff8800290100a0 si: ffff8800290100a0 di: ffff8800290100a0
> > bp: ffff880131ca1a28 sp: ffff880131ca1978 r8: 0000000000000002
> > r9: 0000000000000000 r10: 0000000000000001 r11: 0000000000000000
> > r12: ffff880029010000 r13: 0000000000000000 r14: ffff880029010c30
> > r15: 00000000ffffff8b ip: ffffffff81279f84 flags: 00010286 cs: 00000010
> > ss: 00000018 ds: 00000018 es: 00000018 fs: 00000018 gs: 00000018
> > [7]kdb> bt
> > Stack traceback for pid 26045
> > 0xffff88020aadbf20 26045 25958 1 7 R 0xffff88020aadc3a8 *ceph-osd
> > ffff880131ca1978 0000000000000018 ffff880131ca1978 ffff880131ca1998
> > bb7933908cd5a32b 0000000000000000 0000000000000001 0000000000000002
> > 0000000000000000 ffff880131ca19b8 0000000000000000 ffff880131ca19f8
> > Call Trace:
> > [<ffffffff8104986f>] ? warn_slowpath_common+0x9f/0xc0
> > [<ffffffff81257308>] ? __ext4_journal_stop+0x78/0xa0
> > [<ffffffff812578dc>] ? __ext4_handle_dirty_metadata+0xbc/0x140
> > [<ffffffff8122ca3e>] ? ext4_mark_iloc_dirty+0x40e/0x660
> > [<ffffffff8126a465>] ? ext4_xattr_set_handle+0x265/0x4a0
> > [<ffffffff8126a765>] ? ext4_xattr_set+0xc5/0x140
> > [<ffffffff8126b177>] ? ext4_xattr_user_set+0x47/0x50
> > [<ffffffff811a3fee>] ? generic_setxattr+0x6e/0x90
> > [<ffffffff811a48eb>] ? __vfs_setxattr_noperm+0x7b/0x1c0
> > [<ffffffff811a4af4>] ? vfs_setxattr+0xc4/0xd0
> > [<ffffffff811a4c3e>] ? setxattr+0x13e/0x1e0
> > [<ffffffff81181ec7>] ? __sb_start_write+0xe7/0x1b0
> > [<ffffffff8119fb98>] ? mnt_want_write_file+0x28/0x60
> > [<ffffffff8119cf0c>] ? fget_light+0x3c/0x130
> > [<ffffffff8119fb98>] ? mnt_want_write_file+0x28/0x60
> > [<ffffffff8119e902>] ? mnt_clone_write+0x12/0x30
> > [<ffffffff811a50de>] ? SyS_fsetxattr+0xbe/0x100
> > [<ffffffff81653782>] ? system_call_fastpath+0x16/0x1b
> >
> > Let me know if there is anything else I can gather from this machine that
> > will help!
> >
> > sage
> --
> Jan Kara <[email protected]>
> SUSE Labs, CR
>
>


Attachments:
k (0.00 B)

2013-08-23 19:52:55

by Jan Kara

[permalink] [raw]
Subject: Re: crash in __jbd2_journal_file_buffer

On Fri 23-08-13 08:02:45, Sage Weil wrote:
> On Fri, 23 Aug 2013, Jan Kara wrote:
> > On Thu 22-08-13 16:35:15, Sage Weil wrote:
> > > On Tue, 13 Aug 2013, Jan Kara wrote:
> > > > On Mon 12-08-13 11:13:06, Sage Weil wrote:
> > > > > Full dmesg is attached.
> > > > Hum, nothing interesting in there...
> > > >
> > > > > Our QA seems to hit this with some regularity. Let me know if there's
> > > > > some combination of patches that would help shed more light!
> > > > If they can run with attached debug patch it could maybe sched some more
> > > > light. Please send also your System.map file together with the dmesg of the
> > > > kernel when the crash happens so that I can map addresses to function
> > > > names... Thanks!
> > >
> > > Okay, finally hit it again:
> > >
> > > <6>[75193.192249] EXT4-fs (sda1): re-mounted. Opts: errors=remount-ro,user_xattr,user_xattr
> > > <3>[77877.426658] Dirtying buffer without jh at 4302720297: state 218c029,jh added from 0xffffffff8127ab1d at 4302720297, removed from 0xffffffff8127b5b0 at 4302720296
> > Great! Can you please send me /proc/kallsyms from the machine?
>
> Attached!
Sadly not. I think your MUA got confused by the fact that files in /proc
have 0 length and attached empty file.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2013-08-23 20:48:12

by Jan Kara

[permalink] [raw]
Subject: Re: crash in __jbd2_journal_file_buffer

On Thu 22-08-13 16:35:15, Sage Weil wrote:
> On Tue, 13 Aug 2013, Jan Kara wrote:
> > On Mon 12-08-13 11:13:06, Sage Weil wrote:
> > > Full dmesg is attached.
> > Hum, nothing interesting in there...
> >
> > > Our QA seems to hit this with some regularity. Let me know if there's
> > > some combination of patches that would help shed more light!
> > If they can run with attached debug patch it could maybe sched some more
> > light. Please send also your System.map file together with the dmesg of the
> > kernel when the crash happens so that I can map addresses to function
> > names... Thanks!
>
> Okay, finally hit it again:
>
> <6>[75193.192249] EXT4-fs (sda1): re-mounted. Opts: errors=remount-ro,user_xattr,user_xattr
> <3>[77877.426658] Dirtying buffer without jh at 4302720297: state 218c029,jh added from 0xffffffff8127ab1d at 4302720297, removed from 0xffffffff8127b5b0 at 4302720296
> <4>[77877.441200] ------------[ cut here ]------------
> <4>[77877.445845] WARNING: CPU: 7 PID: 26045 at /srv/autobuild-ceph/gitbuilder.git/build/fs/jbd2/transaction.c:1380 jbd2_journal_dirty_metadata+0x1f1/0x2e0()
>
> <4>[77877.497349] CPU: 7 PID: 26045 Comm: ceph-osd Not tainted 3.11.0-rc5-ceph-00061-g546140d #1
> <4>[77877.505649] Hardware name: Dell Inc. PowerEdge R410/01V648, BIOS 1.6.3 02/07/2011
> <4>[77877.513213] 0000000000000564 ffff880131ca1938 ffffffff81642d85 ffff8802272ef290
> <4>[77877.520694] 0000000000000000 ffff880131ca1978 ffffffff8104985c ffff880131ca19a0
> <4>[77877.528218] ffff88020f695aa0 0000000000000000 ffff880214c48b40 ffff88020be55000
> <4>[77877.535756] Call Trace:
> <4>[77877.538279] [<ffffffff81642d85>] dump_stack+0x46/0x58
> <4>[77877.543439] [<ffffffff8104985c>] warn_slowpath_common+0x8c/0xc0
> <4>[77877.549548] [<ffffffff810498aa>] warn_slowpath_null+0x1a/0x20
> <4>[77877.555413] [<ffffffff8127adb1>] jbd2_journal_dirty_metadata+0x1f1/0x2e0
> <4>[77877.562288] [<ffffffff812578c3>] __ext4_handle_dirty_metadata+0xa3/0x140
> <4>[77877.569155] [<ffffffff81268e23>] ext4_xattr_release_block+0x103/0x1f0
> <4>[77877.575723] [<ffffffff812692b0>] ext4_xattr_block_set+0x1e0/0x910
> <4>[77877.581990] [<ffffffff8126a58b>] ext4_xattr_set_handle+0x38b/0x4a0
> <4>[77877.588335] [<ffffffff810af7cd>] ? trace_hardirqs_on+0xd/0x10
> <4>[77877.594188] [<ffffffff8126a765>] ext4_xattr_set+0xc5/0x140
> <4>[77877.599837] [<ffffffff8126b177>] ext4_xattr_user_set+0x47/0x50
> <4>[77877.605779] [<ffffffff811a3fee>] generic_setxattr+0x6e/0x90
> <4>[77877.611514] [<ffffffff811a48eb>] __vfs_setxattr_noperm+0x7b/0x1c0
> <4>[77877.617773] [<ffffffff811a4af4>] vfs_setxattr+0xc4/0xd0
> <4>[77877.623103] [<ffffffff811a4c3e>] setxattr+0x13e/0x1e0
> <4>[77877.628317] [<ffffffff81181ec7>] ? __sb_start_write+0xe7/0x1b0
> <4>[77877.634260] [<ffffffff8119fb98>] ? mnt_want_write_file+0x28/0x60
> <4>[77877.640428] [<ffffffff8119cf0c>] ? fget_light+0x3c/0x130
> <4>[77877.645847] [<ffffffff8119fb98>] ? mnt_want_write_file+0x28/0x60
> <4>[77877.652015] [<ffffffff8119e902>] ? mnt_clone_write+0x12/0x30
> <4>[77877.657897] [<ffffffff811a50de>] SyS_fsetxattr+0xbe/0x100
> <4>[77877.663405] [<ffffffff81653782>] system_call_fastpath+0x16/0x1b
> <4>[77877.669488] ---[ end trace bb7933908cd5a32a ]---
I was scratching my head for a while how this could happen but I think I
see the race now. Think we have two inodes A and B which share the same
xattr block (so &BHDR(bh)->h_refcount == 2). Now we are changing xattrs for
both inodes in parallel - that can easily happen because xattr locking is
generally per inode (EXT4_I(inode)->xattr_sem). The following race then
happens:
CPU1 CPU2
ext4_xattr_release_block() ext4_xattr_release_block()
lock_buffer(bh);
/* False */
if (BHDR(bh)->h_refcount == cpu_to_le32(1))
} else {
le32_add_cpu(&BHDR(bh)->h_refcount, -1);
unlock_buffer(bh);
lock_buffer(bh);
/* True */
if (BHDR(bh)->h_refcount == cpu_to_le32(1))
get_bh(bh);
ext4_free_blocks()
...
jbd2_journal_forget()
jbd2_journal_unfile_buffer()
-> JH is gone
error = ext4_handle_dirty_xattr_block(handle, inode, bh);
-> triggers warning

Now easy fix would be to move ext4_handle_dirty_xattr_block() before
unlock_buffer() but I don't really like that because of locking
constraints. I'll think about it...

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2013-09-10 22:19:04

by Jan Kara

[permalink] [raw]
Subject: Re: crash in __jbd2_journal_file_buffer

On Fri 23-08-13 22:48:10, Jan Kara wrote:
> On Thu 22-08-13 16:35:15, Sage Weil wrote:
> > On Tue, 13 Aug 2013, Jan Kara wrote:
> > > On Mon 12-08-13 11:13:06, Sage Weil wrote:
> > > > Full dmesg is attached.
> > > Hum, nothing interesting in there...
> > >
> > > > Our QA seems to hit this with some regularity. Let me know if there's
> > > > some combination of patches that would help shed more light!
> > > If they can run with attached debug patch it could maybe sched some more
> > > light. Please send also your System.map file together with the dmesg of the
> > > kernel when the crash happens so that I can map addresses to function
> > > names... Thanks!
> >
> > Okay, finally hit it again:
> >
> > <6>[75193.192249] EXT4-fs (sda1): re-mounted. Opts: errors=remount-ro,user_xattr,user_xattr
> > <3>[77877.426658] Dirtying buffer without jh at 4302720297: state 218c029,jh added from 0xffffffff8127ab1d at 4302720297, removed from 0xffffffff8127b5b0 at 4302720296
> > <4>[77877.441200] ------------[ cut here ]------------
> > <4>[77877.445845] WARNING: CPU: 7 PID: 26045 at /srv/autobuild-ceph/gitbuilder.git/build/fs/jbd2/transaction.c:1380 jbd2_journal_dirty_metadata+0x1f1/0x2e0()
> >
> > <4>[77877.497349] CPU: 7 PID: 26045 Comm: ceph-osd Not tainted 3.11.0-rc5-ceph-00061-g546140d #1
> > <4>[77877.505649] Hardware name: Dell Inc. PowerEdge R410/01V648, BIOS 1.6.3 02/07/2011
> > <4>[77877.513213] 0000000000000564 ffff880131ca1938 ffffffff81642d85 ffff8802272ef290
> > <4>[77877.520694] 0000000000000000 ffff880131ca1978 ffffffff8104985c ffff880131ca19a0
> > <4>[77877.528218] ffff88020f695aa0 0000000000000000 ffff880214c48b40 ffff88020be55000
> > <4>[77877.535756] Call Trace:
> > <4>[77877.538279] [<ffffffff81642d85>] dump_stack+0x46/0x58
> > <4>[77877.543439] [<ffffffff8104985c>] warn_slowpath_common+0x8c/0xc0
> > <4>[77877.549548] [<ffffffff810498aa>] warn_slowpath_null+0x1a/0x20
> > <4>[77877.555413] [<ffffffff8127adb1>] jbd2_journal_dirty_metadata+0x1f1/0x2e0
> > <4>[77877.562288] [<ffffffff812578c3>] __ext4_handle_dirty_metadata+0xa3/0x140
> > <4>[77877.569155] [<ffffffff81268e23>] ext4_xattr_release_block+0x103/0x1f0
> > <4>[77877.575723] [<ffffffff812692b0>] ext4_xattr_block_set+0x1e0/0x910
> > <4>[77877.581990] [<ffffffff8126a58b>] ext4_xattr_set_handle+0x38b/0x4a0
> > <4>[77877.588335] [<ffffffff810af7cd>] ? trace_hardirqs_on+0xd/0x10
> > <4>[77877.594188] [<ffffffff8126a765>] ext4_xattr_set+0xc5/0x140
> > <4>[77877.599837] [<ffffffff8126b177>] ext4_xattr_user_set+0x47/0x50
> > <4>[77877.605779] [<ffffffff811a3fee>] generic_setxattr+0x6e/0x90
> > <4>[77877.611514] [<ffffffff811a48eb>] __vfs_setxattr_noperm+0x7b/0x1c0
> > <4>[77877.617773] [<ffffffff811a4af4>] vfs_setxattr+0xc4/0xd0
> > <4>[77877.623103] [<ffffffff811a4c3e>] setxattr+0x13e/0x1e0
> > <4>[77877.628317] [<ffffffff81181ec7>] ? __sb_start_write+0xe7/0x1b0
> > <4>[77877.634260] [<ffffffff8119fb98>] ? mnt_want_write_file+0x28/0x60
> > <4>[77877.640428] [<ffffffff8119cf0c>] ? fget_light+0x3c/0x130
> > <4>[77877.645847] [<ffffffff8119fb98>] ? mnt_want_write_file+0x28/0x60
> > <4>[77877.652015] [<ffffffff8119e902>] ? mnt_clone_write+0x12/0x30
> > <4>[77877.657897] [<ffffffff811a50de>] SyS_fsetxattr+0xbe/0x100
> > <4>[77877.663405] [<ffffffff81653782>] system_call_fastpath+0x16/0x1b
> > <4>[77877.669488] ---[ end trace bb7933908cd5a32a ]---
> I was scratching my head for a while how this could happen but I think I
> see the race now. Think we have two inodes A and B which share the same
> xattr block (so &BHDR(bh)->h_refcount == 2). Now we are changing xattrs for
> both inodes in parallel - that can easily happen because xattr locking is
> generally per inode (EXT4_I(inode)->xattr_sem). The following race then
> happens:
> CPU1 CPU2
> ext4_xattr_release_block() ext4_xattr_release_block()
> lock_buffer(bh);
> /* False */
> if (BHDR(bh)->h_refcount == cpu_to_le32(1))
> } else {
> le32_add_cpu(&BHDR(bh)->h_refcount, -1);
> unlock_buffer(bh);
> lock_buffer(bh);
> /* True */
> if (BHDR(bh)->h_refcount == cpu_to_le32(1))
> get_bh(bh);
> ext4_free_blocks()
> ...
> jbd2_journal_forget()
> jbd2_journal_unfile_buffer()
> -> JH is gone
> error = ext4_handle_dirty_xattr_block(handle, inode, bh);
> -> triggers warning
>
> Now easy fix would be to move ext4_handle_dirty_xattr_block() before
> unlock_buffer() but I don't really like that because of locking
> constraints. I'll think about it...
So finally I've got back to this. Attached is a somewhat ugly patch that
should fix this issue. Can you please test it? Thanks!

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR


Attachments:
(No filename) (4.60 kB)
0001-ext4-Fix-jbd2-warning-under-heavy-xattr-load.patch (5.30 kB)
Download all attachments

2013-09-10 22:32:47

by Sage Weil

[permalink] [raw]
Subject: Re: crash in __jbd2_journal_file_buffer

On Wed, 11 Sep 2013, Jan Kara wrote:
> On Fri 23-08-13 22:48:10, Jan Kara wrote:
> > On Thu 22-08-13 16:35:15, Sage Weil wrote:
> > > On Tue, 13 Aug 2013, Jan Kara wrote:
> > > > On Mon 12-08-13 11:13:06, Sage Weil wrote:
> > > > > Full dmesg is attached.
> > > > Hum, nothing interesting in there...
> > > >
> > > > > Our QA seems to hit this with some regularity. Let me know if there's
> > > > > some combination of patches that would help shed more light!
> > > > If they can run with attached debug patch it could maybe sched some more
> > > > light. Please send also your System.map file together with the dmesg of the
> > > > kernel when the crash happens so that I can map addresses to function
> > > > names... Thanks!
> > >
> > > Okay, finally hit it again:
> > >
> > > <6>[75193.192249] EXT4-fs (sda1): re-mounted. Opts: errors=remount-ro,user_xattr,user_xattr
> > > <3>[77877.426658] Dirtying buffer without jh at 4302720297: state 218c029,jh added from 0xffffffff8127ab1d at 4302720297, removed from 0xffffffff8127b5b0 at 4302720296
> > > <4>[77877.441200] ------------[ cut here ]------------
> > > <4>[77877.445845] WARNING: CPU: 7 PID: 26045 at /srv/autobuild-ceph/gitbuilder.git/build/fs/jbd2/transaction.c:1380 jbd2_journal_dirty_metadata+0x1f1/0x2e0()
> > >
> > > <4>[77877.497349] CPU: 7 PID: 26045 Comm: ceph-osd Not tainted 3.11.0-rc5-ceph-00061-g546140d #1
> > > <4>[77877.505649] Hardware name: Dell Inc. PowerEdge R410/01V648, BIOS 1.6.3 02/07/2011
> > > <4>[77877.513213] 0000000000000564 ffff880131ca1938 ffffffff81642d85 ffff8802272ef290
> > > <4>[77877.520694] 0000000000000000 ffff880131ca1978 ffffffff8104985c ffff880131ca19a0
> > > <4>[77877.528218] ffff88020f695aa0 0000000000000000 ffff880214c48b40 ffff88020be55000
> > > <4>[77877.535756] Call Trace:
> > > <4>[77877.538279] [<ffffffff81642d85>] dump_stack+0x46/0x58
> > > <4>[77877.543439] [<ffffffff8104985c>] warn_slowpath_common+0x8c/0xc0
> > > <4>[77877.549548] [<ffffffff810498aa>] warn_slowpath_null+0x1a/0x20
> > > <4>[77877.555413] [<ffffffff8127adb1>] jbd2_journal_dirty_metadata+0x1f1/0x2e0
> > > <4>[77877.562288] [<ffffffff812578c3>] __ext4_handle_dirty_metadata+0xa3/0x140
> > > <4>[77877.569155] [<ffffffff81268e23>] ext4_xattr_release_block+0x103/0x1f0
> > > <4>[77877.575723] [<ffffffff812692b0>] ext4_xattr_block_set+0x1e0/0x910
> > > <4>[77877.581990] [<ffffffff8126a58b>] ext4_xattr_set_handle+0x38b/0x4a0
> > > <4>[77877.588335] [<ffffffff810af7cd>] ? trace_hardirqs_on+0xd/0x10
> > > <4>[77877.594188] [<ffffffff8126a765>] ext4_xattr_set+0xc5/0x140
> > > <4>[77877.599837] [<ffffffff8126b177>] ext4_xattr_user_set+0x47/0x50
> > > <4>[77877.605779] [<ffffffff811a3fee>] generic_setxattr+0x6e/0x90
> > > <4>[77877.611514] [<ffffffff811a48eb>] __vfs_setxattr_noperm+0x7b/0x1c0
> > > <4>[77877.617773] [<ffffffff811a4af4>] vfs_setxattr+0xc4/0xd0
> > > <4>[77877.623103] [<ffffffff811a4c3e>] setxattr+0x13e/0x1e0
> > > <4>[77877.628317] [<ffffffff81181ec7>] ? __sb_start_write+0xe7/0x1b0
> > > <4>[77877.634260] [<ffffffff8119fb98>] ? mnt_want_write_file+0x28/0x60
> > > <4>[77877.640428] [<ffffffff8119cf0c>] ? fget_light+0x3c/0x130
> > > <4>[77877.645847] [<ffffffff8119fb98>] ? mnt_want_write_file+0x28/0x60
> > > <4>[77877.652015] [<ffffffff8119e902>] ? mnt_clone_write+0x12/0x30
> > > <4>[77877.657897] [<ffffffff811a50de>] SyS_fsetxattr+0xbe/0x100
> > > <4>[77877.663405] [<ffffffff81653782>] system_call_fastpath+0x16/0x1b
> > > <4>[77877.669488] ---[ end trace bb7933908cd5a32a ]---
> > I was scratching my head for a while how this could happen but I think I
> > see the race now. Think we have two inodes A and B which share the same
> > xattr block (so &BHDR(bh)->h_refcount == 2). Now we are changing xattrs for
> > both inodes in parallel - that can easily happen because xattr locking is
> > generally per inode (EXT4_I(inode)->xattr_sem). The following race then
> > happens:
> > CPU1 CPU2
> > ext4_xattr_release_block() ext4_xattr_release_block()
> > lock_buffer(bh);
> > /* False */
> > if (BHDR(bh)->h_refcount == cpu_to_le32(1))
> > } else {
> > le32_add_cpu(&BHDR(bh)->h_refcount, -1);
> > unlock_buffer(bh);
> > lock_buffer(bh);
> > /* True */
> > if (BHDR(bh)->h_refcount == cpu_to_le32(1))
> > get_bh(bh);
> > ext4_free_blocks()
> > ...
> > jbd2_journal_forget()
> > jbd2_journal_unfile_buffer()
> > -> JH is gone
> > error = ext4_handle_dirty_xattr_block(handle, inode, bh);
> > -> triggers warning
> >
> > Now easy fix would be to move ext4_handle_dirty_xattr_block() before
> > unlock_buffer() but I don't really like that because of locking
> > constraints. I'll think about it...
> So finally I've got back to this. Attached is a somewhat ugly patch that
> should fix this issue. Can you please test it? Thanks!

Sure; added it to our test tree. I haven't seen this in a week probably,
so it'll be hard to definitively say it's fixed, but it'll get plenty of
testing. :)

Thanks!
sage

2013-11-11 22:20:56

by Jan Kara

[permalink] [raw]
Subject: Re: crash in __jbd2_journal_file_buffer

On Tue 10-09-13 15:32:47, Sage Weil wrote:
> On Wed, 11 Sep 2013, Jan Kara wrote:
> > On Fri 23-08-13 22:48:10, Jan Kara wrote:
> > > On Thu 22-08-13 16:35:15, Sage Weil wrote:
> > > > On Tue, 13 Aug 2013, Jan Kara wrote:
> > > > > On Mon 12-08-13 11:13:06, Sage Weil wrote:
> > > > > > Full dmesg is attached.
> > > > > Hum, nothing interesting in there...
> > > > >
> > > > > > Our QA seems to hit this with some regularity. Let me know if there's
> > > > > > some combination of patches that would help shed more light!
> > > > > If they can run with attached debug patch it could maybe sched some more
> > > > > light. Please send also your System.map file together with the dmesg of the
> > > > > kernel when the crash happens so that I can map addresses to function
> > > > > names... Thanks!
> > > >
> > > > Okay, finally hit it again:
> > > >
> > > > <6>[75193.192249] EXT4-fs (sda1): re-mounted. Opts: errors=remount-ro,user_xattr,user_xattr
> > > > <3>[77877.426658] Dirtying buffer without jh at 4302720297: state 218c029,jh added from 0xffffffff8127ab1d at 4302720297, removed from 0xffffffff8127b5b0 at 4302720296
> > > > <4>[77877.441200] ------------[ cut here ]------------
> > > > <4>[77877.445845] WARNING: CPU: 7 PID: 26045 at /srv/autobuild-ceph/gitbuilder.git/build/fs/jbd2/transaction.c:1380 jbd2_journal_dirty_metadata+0x1f1/0x2e0()
> > > >
> > > > <4>[77877.497349] CPU: 7 PID: 26045 Comm: ceph-osd Not tainted 3.11.0-rc5-ceph-00061-g546140d #1
> > > > <4>[77877.505649] Hardware name: Dell Inc. PowerEdge R410/01V648, BIOS 1.6.3 02/07/2011
> > > > <4>[77877.513213] 0000000000000564 ffff880131ca1938 ffffffff81642d85 ffff8802272ef290
> > > > <4>[77877.520694] 0000000000000000 ffff880131ca1978 ffffffff8104985c ffff880131ca19a0
> > > > <4>[77877.528218] ffff88020f695aa0 0000000000000000 ffff880214c48b40 ffff88020be55000
> > > > <4>[77877.535756] Call Trace:
> > > > <4>[77877.538279] [<ffffffff81642d85>] dump_stack+0x46/0x58
> > > > <4>[77877.543439] [<ffffffff8104985c>] warn_slowpath_common+0x8c/0xc0
> > > > <4>[77877.549548] [<ffffffff810498aa>] warn_slowpath_null+0x1a/0x20
> > > > <4>[77877.555413] [<ffffffff8127adb1>] jbd2_journal_dirty_metadata+0x1f1/0x2e0
> > > > <4>[77877.562288] [<ffffffff812578c3>] __ext4_handle_dirty_metadata+0xa3/0x140
> > > > <4>[77877.569155] [<ffffffff81268e23>] ext4_xattr_release_block+0x103/0x1f0
> > > > <4>[77877.575723] [<ffffffff812692b0>] ext4_xattr_block_set+0x1e0/0x910
> > > > <4>[77877.581990] [<ffffffff8126a58b>] ext4_xattr_set_handle+0x38b/0x4a0
> > > > <4>[77877.588335] [<ffffffff810af7cd>] ? trace_hardirqs_on+0xd/0x10
> > > > <4>[77877.594188] [<ffffffff8126a765>] ext4_xattr_set+0xc5/0x140
> > > > <4>[77877.599837] [<ffffffff8126b177>] ext4_xattr_user_set+0x47/0x50
> > > > <4>[77877.605779] [<ffffffff811a3fee>] generic_setxattr+0x6e/0x90
> > > > <4>[77877.611514] [<ffffffff811a48eb>] __vfs_setxattr_noperm+0x7b/0x1c0
> > > > <4>[77877.617773] [<ffffffff811a4af4>] vfs_setxattr+0xc4/0xd0
> > > > <4>[77877.623103] [<ffffffff811a4c3e>] setxattr+0x13e/0x1e0
> > > > <4>[77877.628317] [<ffffffff81181ec7>] ? __sb_start_write+0xe7/0x1b0
> > > > <4>[77877.634260] [<ffffffff8119fb98>] ? mnt_want_write_file+0x28/0x60
> > > > <4>[77877.640428] [<ffffffff8119cf0c>] ? fget_light+0x3c/0x130
> > > > <4>[77877.645847] [<ffffffff8119fb98>] ? mnt_want_write_file+0x28/0x60
> > > > <4>[77877.652015] [<ffffffff8119e902>] ? mnt_clone_write+0x12/0x30
> > > > <4>[77877.657897] [<ffffffff811a50de>] SyS_fsetxattr+0xbe/0x100
> > > > <4>[77877.663405] [<ffffffff81653782>] system_call_fastpath+0x16/0x1b
> > > > <4>[77877.669488] ---[ end trace bb7933908cd5a32a ]---
> > > I was scratching my head for a while how this could happen but I think I
> > > see the race now. Think we have two inodes A and B which share the same
> > > xattr block (so &BHDR(bh)->h_refcount == 2). Now we are changing xattrs for
> > > both inodes in parallel - that can easily happen because xattr locking is
> > > generally per inode (EXT4_I(inode)->xattr_sem). The following race then
> > > happens:
> > > CPU1 CPU2
> > > ext4_xattr_release_block() ext4_xattr_release_block()
> > > lock_buffer(bh);
> > > /* False */
> > > if (BHDR(bh)->h_refcount == cpu_to_le32(1))
> > > } else {
> > > le32_add_cpu(&BHDR(bh)->h_refcount, -1);
> > > unlock_buffer(bh);
> > > lock_buffer(bh);
> > > /* True */
> > > if (BHDR(bh)->h_refcount == cpu_to_le32(1))
> > > get_bh(bh);
> > > ext4_free_blocks()
> > > ...
> > > jbd2_journal_forget()
> > > jbd2_journal_unfile_buffer()
> > > -> JH is gone
> > > error = ext4_handle_dirty_xattr_block(handle, inode, bh);
> > > -> triggers warning
> > >
> > > Now easy fix would be to move ext4_handle_dirty_xattr_block() before
> > > unlock_buffer() but I don't really like that because of locking
> > > constraints. I'll think about it...
> > So finally I've got back to this. Attached is a somewhat ugly patch that
> > should fix this issue. Can you please test it? Thanks!
>
> Sure; added it to our test tree. I haven't seen this in a week probably,
> so it'll be hard to definitively say it's fixed, but it'll get plenty of
> testing. :)
Any luck with testing? It has been two months so if the bug didn't
reappear I'd hope it really got fixed...

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2014-04-05 19:20:07

by Sage Weil

[permalink] [raw]
Subject: Re: crash in __jbd2_journal_file_buffer

Hi Jan,

Thanks for reminding me about this at LSF. Both your debug patch and this
fix were dropped from our testing branch a while ago, but we just hit the
crash again yesterday on 3.14-rc5. The fix was in our tree for quite some
time, though, and did not cause any problems; I think it got dropped when
we rebased for the 3.14 cycle. I'll add it back now.

Unfortunately this triggers pretty infrequently so I'm not sure I can give
a definitive "this fixes it," but it feels that way, and we haven't seen
any fallout as a result.

Let me know if there's anything else we can put into the test kernel that
might help out.

Thanks!
sage


On Wed, 11 Sep 2013, Jan Kara wrote:

> On Fri 23-08-13 22:48:10, Jan Kara wrote:
> > On Thu 22-08-13 16:35:15, Sage Weil wrote:
> > > On Tue, 13 Aug 2013, Jan Kara wrote:
> > > > On Mon 12-08-13 11:13:06, Sage Weil wrote:
> > > > > Full dmesg is attached.
> > > > Hum, nothing interesting in there...
> > > >
> > > > > Our QA seems to hit this with some regularity. Let me know if there's
> > > > > some combination of patches that would help shed more light!
> > > > If they can run with attached debug patch it could maybe sched some more
> > > > light. Please send also your System.map file together with the dmesg of the
> > > > kernel when the crash happens so that I can map addresses to function
> > > > names... Thanks!
> > >
> > > Okay, finally hit it again:
> > >
> > > <6>[75193.192249] EXT4-fs (sda1): re-mounted. Opts: errors=remount-ro,user_xattr,user_xattr
> > > <3>[77877.426658] Dirtying buffer without jh at 4302720297: state 218c029,jh added from 0xffffffff8127ab1d at 4302720297, removed from 0xffffffff8127b5b0 at 4302720296
> > > <4>[77877.441200] ------------[ cut here ]------------
> > > <4>[77877.445845] WARNING: CPU: 7 PID: 26045 at /srv/autobuild-ceph/gitbuilder.git/build/fs/jbd2/transaction.c:1380 jbd2_journal_dirty_metadata+0x1f1/0x2e0()
> > >
> > > <4>[77877.497349] CPU: 7 PID: 26045 Comm: ceph-osd Not tainted 3.11.0-rc5-ceph-00061-g546140d #1
> > > <4>[77877.505649] Hardware name: Dell Inc. PowerEdge R410/01V648, BIOS 1.6.3 02/07/2011
> > > <4>[77877.513213] 0000000000000564 ffff880131ca1938 ffffffff81642d85 ffff8802272ef290
> > > <4>[77877.520694] 0000000000000000 ffff880131ca1978 ffffffff8104985c ffff880131ca19a0
> > > <4>[77877.528218] ffff88020f695aa0 0000000000000000 ffff880214c48b40 ffff88020be55000
> > > <4>[77877.535756] Call Trace:
> > > <4>[77877.538279] [<ffffffff81642d85>] dump_stack+0x46/0x58
> > > <4>[77877.543439] [<ffffffff8104985c>] warn_slowpath_common+0x8c/0xc0
> > > <4>[77877.549548] [<ffffffff810498aa>] warn_slowpath_null+0x1a/0x20
> > > <4>[77877.555413] [<ffffffff8127adb1>] jbd2_journal_dirty_metadata+0x1f1/0x2e0
> > > <4>[77877.562288] [<ffffffff812578c3>] __ext4_handle_dirty_metadata+0xa3/0x140
> > > <4>[77877.569155] [<ffffffff81268e23>] ext4_xattr_release_block+0x103/0x1f0
> > > <4>[77877.575723] [<ffffffff812692b0>] ext4_xattr_block_set+0x1e0/0x910
> > > <4>[77877.581990] [<ffffffff8126a58b>] ext4_xattr_set_handle+0x38b/0x4a0
> > > <4>[77877.588335] [<ffffffff810af7cd>] ? trace_hardirqs_on+0xd/0x10
> > > <4>[77877.594188] [<ffffffff8126a765>] ext4_xattr_set+0xc5/0x140
> > > <4>[77877.599837] [<ffffffff8126b177>] ext4_xattr_user_set+0x47/0x50
> > > <4>[77877.605779] [<ffffffff811a3fee>] generic_setxattr+0x6e/0x90
> > > <4>[77877.611514] [<ffffffff811a48eb>] __vfs_setxattr_noperm+0x7b/0x1c0
> > > <4>[77877.617773] [<ffffffff811a4af4>] vfs_setxattr+0xc4/0xd0
> > > <4>[77877.623103] [<ffffffff811a4c3e>] setxattr+0x13e/0x1e0
> > > <4>[77877.628317] [<ffffffff81181ec7>] ? __sb_start_write+0xe7/0x1b0
> > > <4>[77877.634260] [<ffffffff8119fb98>] ? mnt_want_write_file+0x28/0x60
> > > <4>[77877.640428] [<ffffffff8119cf0c>] ? fget_light+0x3c/0x130
> > > <4>[77877.645847] [<ffffffff8119fb98>] ? mnt_want_write_file+0x28/0x60
> > > <4>[77877.652015] [<ffffffff8119e902>] ? mnt_clone_write+0x12/0x30
> > > <4>[77877.657897] [<ffffffff811a50de>] SyS_fsetxattr+0xbe/0x100
> > > <4>[77877.663405] [<ffffffff81653782>] system_call_fastpath+0x16/0x1b
> > > <4>[77877.669488] ---[ end trace bb7933908cd5a32a ]---
> > I was scratching my head for a while how this could happen but I think I
> > see the race now. Think we have two inodes A and B which share the same
> > xattr block (so &BHDR(bh)->h_refcount == 2). Now we are changing xattrs for
> > both inodes in parallel - that can easily happen because xattr locking is
> > generally per inode (EXT4_I(inode)->xattr_sem). The following race then
> > happens:
> > CPU1 CPU2
> > ext4_xattr_release_block() ext4_xattr_release_block()
> > lock_buffer(bh);
> > /* False */
> > if (BHDR(bh)->h_refcount == cpu_to_le32(1))
> > } else {
> > le32_add_cpu(&BHDR(bh)->h_refcount, -1);
> > unlock_buffer(bh);
> > lock_buffer(bh);
> > /* True */
> > if (BHDR(bh)->h_refcount == cpu_to_le32(1))
> > get_bh(bh);
> > ext4_free_blocks()
> > ...
> > jbd2_journal_forget()
> > jbd2_journal_unfile_buffer()
> > -> JH is gone
> > error = ext4_handle_dirty_xattr_block(handle, inode, bh);
> > -> triggers warning
> >
> > Now easy fix would be to move ext4_handle_dirty_xattr_block() before
> > unlock_buffer() but I don't really like that because of locking
> > constraints. I'll think about it...
> So finally I've got back to this. Attached is a somewhat ugly patch that
> should fix this issue. Can you please test it? Thanks!
>
> Honza
> --
> Jan Kara <[email protected]>
> SUSE Labs, CR
>