2016-11-21 15:28:03

by Wolfgang Walter

[permalink] [raw]
Subject: JBD2: Spotted dirty metadata buffer....

Hello,

I'm testing EXT4 with an external journal (data=journal). When writing I rather often get

JBD2: Spotted dirty metadata buffer (dev = dm-22, blocknr = 1008028301). There's a risk of filesystem corruption in case of system crash.

Now other message is logged. If I unmount the filesystem an do an forced fsck, all seems fine.

The filesystem as it's journal are on a LV (which is again backed by DRBD). The journal, too.

Im using stable kernel 4.8.8.

I created the journal with:

mkfs.ext4 -O journal_dev -v -b 4096 -L jdyn /dev/export/jdyn

I created the filesystem with:

mkfs.ext4 -J device=UUID=625d871f-c278-4acb-916d-774dc78dbd8a -v -b 4096 -E stride=$((512/4)),stripe_width=$((512/4*3)),lazy_itable_init=0,nodiscard -O inline_data,mmp -L dyn /dev/export/dyn


I tested it also with data=writeback. I didn't see these errors then.

Regards,
--
Wolfgang Walter
Studentenwerk M?nchen
Anstalt des ?ffentlichen Rechts


2016-11-22 00:49:43

by Andreas Dilger

[permalink] [raw]
Subject: Re: JBD2: Spotted dirty metadata buffer....

On Nov 21, 2016, at 8:28 AM, Wolfgang Walter <[email protected]> wrote:
>
> Hello,
>
> I'm testing EXT4 with an external journal (data=journal). When writing I rather often get
>
> JBD2: Spotted dirty metadata buffer (dev = dm-22, blocknr = 1008028301). There's a risk of filesystem corruption in case of system crash.

Can you please determine what file this block belongs to and/or what kind
of metadata block it is. You can check "dumpe2fs /dev/dm-22" to list all
of the metadata blocks, and if it isn't listed as filesystem metadata, you
can run "debugfs -c -R 'icheck 1008028301' /dev/dm-22" to find which file
this block belongs to, then "debugfs -c -R 'ncheck <inum>' /dev/dm-22" after
you have the inode number.

> No other message is logged. If I unmount the filesystem an do an forced fsck, all seems fine.

This is likely a bug in the code, marking a block dirty without setting
up the journaling for it correctly. A few bugs like this were fixed a
while ago by Eric, but those fixes should be in 4.8.8.

> The filesystem as it's journal are on a LV (which is again backed by DRBD). The journal, too.
>
> I'm using stable kernel 4.8.8.
>
> I created the journal with:
>
> mkfs.ext4 -O journal_dev -v -b 4096 -L jdyn /dev/export/jdyn
>
> I created the filesystem with:
>
> mkfs.ext4 -J device=UUID=625d871f-c278-4acb-916d-774dc78dbd8a -v -b 4096 -E stride=$((512/4)),stripe_width=$((512/4*3)),lazy_itable_init=0,nodiscard -O inline_data,mmp -L dyn /dev/export/dyn

It may be that inline_data is the culprit, since this is a relatively new
feature that isn't widely used yet.

Cheers, Andreas

> I tested it also with data=writeback. I didn't see these errors then.
>
> Regards,
> --
> Wolfgang Walter
> Studentenwerk M?nchen
> Anstalt des ?ffentlichen Rechts
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html


Cheers, Andreas






Attachments:
signature.asc (833.00 B)
Message signed with OpenPGP using GPGMail

2016-11-22 13:56:06

by Wolfgang Walter

[permalink] [raw]
Subject: Re: JBD2: Spotted dirty metadata buffer....

Am Montag, 21. November 2016, 17:49:36 schrieben Sie:
> On Nov 21, 2016, at 8:28 AM, Wolfgang Walter <[email protected]> wrote:
> > Hello,
> >
> > I'm testing EXT4 with an external journal (data=journal). When writing I
> > rather often get>
> > JBD2: Spotted dirty metadata buffer (dev = dm-22, blocknr = 1008028301).
> > There's a risk of filesystem corruption in case of system crash.
> Can you please determine what file this block belongs to and/or what kind
> of metadata block it is. You can check "dumpe2fs /dev/dm-22" to list all
> of the metadata blocks, and if it isn't listed as filesystem metadata, you
> can run "debugfs -c -R 'icheck 1008028301' /dev/dm-22" to find which file
> this block belongs to, then "debugfs -c -R 'ncheck <inum>' /dev/dm-22" after
> you have the inode number.

This think this is the metadata block which covers 1008028301? :

Group 30762: (Blocks 1008009216-1008041983) csum 0xb3c0 [INODE_UNINIT, ITABLE_ZEROED]
Block bitmap at 1007681546 (bg #30752 + 10), csum 0x94b3d052
Inode bitmap at 1007681562 (bg #30752 + 26), csum 0x00000000
Inode table at 1007684128-1007684383 (bg #30752 + 2592)
8340 free blocks, 4096 free inodes, 0 directories, 4096 unused inodes
Free blocks: 1008021210-1008021211, 1008021506-1008021511, 1008021520-1008021698, 1008021749-1008021887, 1008022528
-1008022655, 1008024576-1008024768, 1008024839-1008024959, 1008026624-1008026879, 1008028524-1008035839
Free inodes: 126001153-126005248

I'm not sure if I read it correct but 1008028301 is part of the metadata?


I checked all of the other blocks logged so far and they are all alike, they are
in between <x> and <y> of a "(Blocks <x>-<y>)" and they are above <b> of the
"Inode table at <a>-<b>".

Another thing I found was that no block has been mentioned more then twice.
If they are mentioned twice this is within a second or so. Maybe this is simply
because they only where touched when I did the rsync.

Here is an example:

[303709.144704] JBD2: Spotted dirty metadata buffer (dev = dm-22, blocknr = 1085342550). There's a risk of filesystem corruption in case of system crash.
[303709.145561] JBD2: Spotted dirty metadata buffer (dev = dm-22, blocknr = 1085342538). There's a risk of filesystem corruption in case of system crash.
[303709.145571] JBD2: Spotted dirty metadata buffer (dev = dm-22, blocknr = 1085342527). There's a risk of filesystem corruption in case of system crash.
[303709.145579] JBD2: Spotted dirty metadata buffer (dev = dm-22, blocknr = 1085342516). There's a risk of filesystem corruption in case of system crash.
[303709.145593] JBD2: Spotted dirty metadata buffer (dev = dm-22, blocknr = 1085342504). There's a risk of filesystem corruption in case of system crash.
[303709.145604] JBD2: Spotted dirty metadata buffer (dev = dm-22, blocknr = 1085342493). There's a risk of filesystem corruption in case of system crash.
[303709.145612] JBD2: Spotted dirty metadata buffer (dev = dm-22, blocknr = 1085342482). There's a risk of filesystem corruption in case of system crash.
[303709.145625] JBD2: Spotted dirty metadata buffer (dev = dm-22, blocknr = 1085342469). There's a risk of filesystem corruption in case of system crash.
[303719.119889] JBD2: Spotted dirty metadata buffer (dev = dm-22, blocknr = 1085343261). There's a risk of filesystem corruption in case of system crash.
[303719.119896] JBD2: Spotted dirty metadata buffer (dev = dm-22, blocknr = 1085343263). There's a risk of filesystem corruption in case of system crash.
[303719.120074] JBD2: Spotted dirty metadata buffer (dev = dm-22, blocknr = 1085343328). There's a risk of filesystem corruption in case of system crash.
[303719.120079] JBD2: Spotted dirty metadata buffer (dev = dm-22, blocknr = 1085343330). There's a risk of filesystem corruption in case of system crash.
[303719.120083] JBD2: Spotted dirty metadata buffer (dev = dm-22, blocknr = 1085343332). There's a risk of filesystem corruption in case of system crash.
[303719.121607] JBD2: Spotted dirty metadata buffer (dev = dm-22, blocknr = 148936933). There's a risk of filesystem corruption in case of system crash.
[303719.121625] JBD2: Spotted dirty metadata buffer (dev = dm-22, blocknr = 148936936). There's a risk of filesystem corruption in case of system crash.
[303719.121651] JBD2: Spotted dirty metadata buffer (dev = dm-22, blocknr = 148936948). There's a risk of filesystem corruption in case of system crash.
[303719.121701] JBD2: Spotted dirty metadata buffer (dev = dm-22, blocknr = 148936962). There's a risk of filesystem corruption in case of system crash.
[303719.121814] JBD2: Spotted dirty metadata buffer (dev = dm-22, blocknr = 198776202). There's a risk of filesystem corruption in case of system crash.
[303719.121887] JBD2: Spotted dirty metadata buffer (dev = dm-22, blocknr = 198776224). There's a risk of filesystem corruption in case of system crash.
[303719.149886] JBD2: Spotted dirty metadata buffer (dev = dm-22, blocknr = 198776224). There's a risk of filesystem corruption in case of system crash.
[303719.149912] JBD2: Spotted dirty metadata buffer (dev = dm-22, blocknr = 198776202). There's a risk of filesystem corruption in case of system crash.
[303719.167924] JBD2: Spotted dirty metadata buffer (dev = dm-22, blocknr = 148936962). There's a risk of filesystem corruption in case of system crash.
[303719.167941] JBD2: Spotted dirty metadata buffer (dev = dm-22, blocknr = 148936948). There's a risk of filesystem corruption in case of system crash.
[303719.167950] JBD2: Spotted dirty metadata buffer (dev = dm-22, blocknr = 148936936). There's a risk of filesystem corruption in case of system crash.
[303719.167956] JBD2: Spotted dirty metadata buffer (dev = dm-22, blocknr = 148936933). There's a risk of filesystem corruption in case of system crash.
[303719.167999] JBD2: Spotted dirty metadata buffer (dev = dm-22, blocknr = 1085343332). There's a risk of filesystem corruption in case of system crash.
[303719.168001] JBD2: Spotted dirty metadata buffer (dev = dm-22, blocknr = 1085343330). There's a risk of filesystem corruption in case of system crash.
[303719.168004] JBD2: Spotted dirty metadata buffer (dev = dm-22, blocknr = 1085343328). There's a risk of filesystem corruption in case of system crash.
[303719.168054] JBD2: Spotted dirty metadata buffer (dev = dm-22, blocknr = 1085343263). There's a risk of filesystem corruption in case of system crash.
[303719.168056] JBD2: Spotted dirty metadata buffer (dev = dm-22, blocknr = 1085343261). There's a risk of filesystem corruption in case of system crash.


>
> > No other message is logged. If I unmount the filesystem an do an forced
> > fsck, all seems fine.
> This is likely a bug in the code, marking a block dirty without setting
> up the journaling for it correctly. A few bugs like this were fixed a
> while ago by Eric, but those fixes should be in 4.8.8.
>
> > The filesystem as it's journal are on a LV (which is again backed by
> > DRBD). The journal, too.
> >
> > I'm using stable kernel 4.8.8.
> >
> > I created the journal with:
> > mkfs.ext4 -O journal_dev -v -b 4096 -L jdyn /dev/export/jdyn
> >
> > I created the filesystem with:
> > mkfs.ext4 -J device=UUID=625d871f-c278-4acb-916d-774dc78dbd8a -v -b 4096
> > -E
> > stride=$((512/4)),stripe_width=$((512/4*3)),lazy_itable_init=0,nodiscard
> > -O inline_data,mmp -L dyn /dev/export/dyn
> It may be that inline_data is the culprit, since this is a relatively new
> feature that isn't widely used yet.
>
> Cheers, Andreas
>
> > I tested it also with data=writeback. I didn't see these errors then.
> >
> > Regards,
> > --
> > Wolfgang Walter
> > Studentenwerk M?nchen
> > Anstalt des ?ffentlichen Rechts
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
>
> Cheers, Andreas

Regards,
--
Wolfgang Walter
Studentenwerk M?nchen
Anstalt des ?ffentlichen Rechts

2016-11-22 23:03:03

by Andreas Dilger

[permalink] [raw]
Subject: Re: JBD2: Spotted dirty metadata buffer....

On Nov 22, 2016, at 6:56 AM, Wolfgang Walter <[email protected]> wrote:
>
> Am Montag, 21. November 2016, 17:49:36 schrieben Sie:
>> On Nov 21, 2016, at 8:28 AM, Wolfgang Walter <[email protected]> wrote:
>>> Hello,
>>>
>>> I'm testing EXT4 with an external journal (data=journal). When writing I
>>> rather often get>
>>> JBD2: Spotted dirty metadata buffer (dev = dm-22, blocknr = 1008028301).
>>> There's a risk of filesystem corruption in case of system crash.
>> Can you please determine what file this block belongs to and/or what kind
>> of metadata block it is. You can check "dumpe2fs /dev/dm-22" to list all
>> of the metadata blocks, and if it isn't listed as filesystem metadata, you
>> can run "debugfs -c -R 'icheck 1008028301' /dev/dm-22" to find which file
>> this block belongs to, then "debugfs -c -R 'ncheck <inum>' /dev/dm-22" after
>> you have the inode number.
>
> This think this is the metadata block which covers 1008028301? :
>
> Group 30762: (Blocks 1008009216-1008041983) csum 0xb3c0 [INODE_UNINIT, ITABLE_ZEROED]
> Block bitmap at 1007681546 (bg #30752 + 10), csum 0x94b3d052
> Inode bitmap at 1007681562 (bg #30752 + 26), csum 0x00000000
> Inode table at 1007684128-1007684383 (bg #30752 + 2592)
> 8340 free blocks, 4096 free inodes, 0 directories, 4096 unused inodes
> Free blocks: 1008021210-1008021211, 1008021506-1008021511, 1008021520-1008021698, 1008021749-1008021887, 1008022528
> -1008022655, 1008024576-1008024768, 1008024839-1008024959, 1008026624-1008026879, 1008028524-1008035839
> Free inodes: 126001153-126005248
>
> I'm not sure if I read it correct but 1008028301 is part of the metadata?
>
> I checked all of the other blocks logged so far and they are all alike,
> they are in between <x> and <y> of a "(Blocks <x>-<y>)" and they are above
> <b> of the "Inode table at <a>-<b>".

This means that block 1008028301 is part of this block group 30762, but
it is not listed in the range of "Block bitmap", "Inode bitmap", or "Inode
table" so it is just a regular block that was allocated to a file. You
can use "debugfs -c -R 'icheck 1008028301' /dev/dm-22" to find out the
inode this block belongs to, and "debugfs -c -R 'stat <inum>' /dev/dm-22"
to see what the block is used for (e.g. data block, indirect block, etc).

Unfortunately, I don't think "debugfs icheck" reports the type of block
that this is, even though it probably could.

Cheers, Andreas

> Another thing I found was that no block has been mentioned more then twice.
> If they are mentioned twice this is within a second or so. Maybe this is
> simply because they only where touched when I did the rsync.
>
> Here is an example:
>
> [303709.144704] JBD2: Spotted dirty metadata buffer (dev = dm-22, blocknr = 1085342550). There's a risk of filesystem corruption in case of system crash.
> [303709.145561] JBD2: Spotted dirty metadata buffer (dev = dm-22, blocknr = 1085342538). There's a risk of filesystem corruption in case of system crash.
>
>>
>>> No other message is logged. If I unmount the filesystem an do an forced
>>> fsck, all seems fine.
>> This is likely a bug in the code, marking a block dirty without setting
>> up the journaling for it correctly. A few bugs like this were fixed a
>> while ago by Eric, but those fixes should be in 4.8.8.
>>
>>> The filesystem as it's journal are on a LV (which is again backed by
>>> DRBD). The journal, too.
>>>
>>> I'm using stable kernel 4.8.8.
>>>
>>> I created the journal with:
>>> mkfs.ext4 -O journal_dev -v -b 4096 -L jdyn /dev/export/jdyn
>>>
>>> I created the filesystem with:
>>> mkfs.ext4 -J device=UUID=625d871f-c278-4acb-916d-774dc78dbd8a -v -b 4096
>>> -E
>>> stride=$((512/4)),stripe_width=$((512/4*3)),lazy_itable_init=0,nodiscard
>>> -O inline_data,mmp -L dyn /dev/export/dyn
>> It may be that inline_data is the culprit, since this is a relatively new
>> feature that isn't widely used yet.
>>
>> Cheers, Andreas
>>
>>> I tested it also with data=writeback. I didn't see these errors then.
>>>
>>> Regards,
>>> --
>>> Wolfgang Walter
>>> Studentenwerk M?nchen
>>> Anstalt des ?ffentlichen Rechts
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
>>> the body of a message to [email protected]
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>> Cheers, Andreas
>
> Regards,
> --
> Wolfgang Walter
> Studentenwerk M?nchen
> Anstalt des ?ffentlichen Rechts


Cheers, Andreas






Attachments:
signature.asc (833.00 B)
Message signed with OpenPGP using GPGMail

2016-11-28 11:26:38

by Wolfgang Walter

[permalink] [raw]
Subject: Re: JBD2: Spotted dirty metadata buffer....

Am Mittwoch, 23. November 2016, 16:40:07 schrieb Andreas Dilger:
> On Nov 23, 2016, at 3:43 AM, Wolfgang Walter <[email protected]> wrote:
> > Am Dienstag, 22. November 2016, 16:02:53 schrieben Sie:
> >> On Nov 22, 2016, at 6:56 AM, Wolfgang Walter <[email protected]> wrote:
> >>> Am Montag, 21. November 2016, 17:49:36 schrieben Sie:
> >>>> On Nov 21, 2016, at 8:28 AM, Wolfgang Walter <[email protected]> wrote:
> >>>>> Hello,

[snip]

> Stepping back a bit - does this problem only happen with an external
> journal device, or does it also happen with an internal journal?
>

So I tried that this weekend. I got again these messages

JBD2: Spotted dirty metadata buffer (dev = dm-22, blocknr = 241763277). There's a risk of filesystem corruption in case of system crash.

So this also happens with an internal journal.

I think it has something to do with data=journal + writing fast a lot of data and removing and/or moving a files at the same time.

I then tried it again with an internal journal but without the option journal_async_commit. I got these messages again. This time though, after some time running, the rsync got stuck in D state. I waited more than an hour but it did not recover. According to iostat there was no IO any more. I couldn't access the filesystem any more, ls -l oder touch got stuck in D state, too. I didn't get any kernel message, though. top reported the whole time one cpu (and always the same one) with about 100% wa time. No process but top itself seemed to be active.

I never had that with external journal + journal_async_commit. When I run bonnie++ and copied large amount of data with dd at the same time I got some problems. But then the filesystem recovered every time, and I got messages like that one:

INFO: task dd:16705 blocked for more than 120 seconds.
Tainted: G E 4.8.8-debian64.all+1.1 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
dd D ffff9d70dfc18700 0 16705 16503 0x00000004
ffff9d686ab53000 ffff9d70d2897080 ffff9d713fffbb80 024280ca8d8d1e9c
ffff9d70af8ac000 ffff9d70af8aba58 ffff9d90a94840ec ffff9d686ab53000
00000000ffffffff ffff9d90a94840f0 ffffffff8e7ee6f1 ffff9d90a94840e8
Call Trace:
[<ffffffff8e7ee6f1>] ? schedule+0x31/0x80
[<ffffffff8e7ee98a>] ? schedule_preempt_disabled+0xa/0x10
[<ffffffff8e7f0574>] ? __mutex_lock_slowpath+0xb4/0x130
[<ffffffff8e7f060b>] ? mutex_lock+0x1b/0x30
[<ffffffffc042e32f>] ? __jbd2_log_wait_for_space+0x7f/0x1e0 [jbd2]
[<ffffffffc0428384>] ? add_transaction_credits+0x284/0x2a0 [jbd2]
[<ffffffff8e2eb31f>] ? add_timer+0x12f/0x220
[<ffffffffc04284ff>] ? start_this_handle+0x10f/0x420 [jbd2]
[<ffffffffc0428c99>] ? jbd2__journal_start+0xe9/0x1f0 [jbd2]
[<ffffffffc0530595>] ? ext4_setattr+0x205/0x880 [ext4]
[<ffffffff8e4226d9>] ? notify_change+0x229/0x430
[<ffffffff8e401cf2>] ? do_truncate+0x72/0xc0
[<ffffffff8e412fa8>] ? path_openat+0x388/0x14d0
[<ffffffff8e415371>] ? do_filp_open+0x91/0x100
[<ffffffff8e3b92cc>] ? handle_mm_fault+0xd1c/0x14b0
[<ffffffff8e4031d7>] ? do_sys_open+0x127/0x210
[<ffffffff8e7f2ab6>] ? entry_SYSCALL_64_fastpath+0x1e/0xa8

INFO: task bonnie++:19693 blocked for more than 120 seconds.
Tainted: G E 4.8.8-debian64.all+1.1 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
bonnie++ D ffff9d70dfad8700 0 19693 16248 0x00000000
ffff9d70b0287040 ffff9d70d289c0c0 ffffffff8e2c7266 ffff9d70b18e4000
ffff9d70b18e4000 ffff9d70b18e3aa0 ffff9d90a94840ec ffff9d70b0287040
00000000ffffffff ffff9d90a94840f0 ffffffff8e7ee6f1 ffff9d90a94840e8
Call Trace:
[<ffffffff8e2c7266>] ? mutex_optimistic_spin+0x176/0x1c0
[<ffffffff8e7ee6f1>] ? schedule+0x31/0x80
[<ffffffff8e7ee98a>] ? schedule_preempt_disabled+0xa/0x10
[<ffffffff8e7f0574>] ? __mutex_lock_slowpath+0xb4/0x130
[<ffffffff8e7f060b>] ? mutex_lock+0x1b/0x30
[<ffffffffc042e32f>] ? __jbd2_log_wait_for_space+0x7f/0x1e0 [jbd2]
[<ffffffffc0428384>] ? add_transaction_credits+0x284/0x2a0 [jbd2]
[<ffffffffc04284ff>] ? start_this_handle+0x10f/0x420 [jbd2]
[<ffffffffc0428c99>] ? jbd2__journal_start+0xe9/0x1f0 [jbd2]
[<ffffffffc052e2b5>] ? ext4_write_begin+0x125/0x4f0 [ext4]
[<ffffffff8e37f0f3>] ? generic_perform_write+0xd3/0x1d0
[<ffffffff8e381a46>] ? __generic_file_write_iter+0x196/0x1e0
[<ffffffffc05231b6>] ? ext4_file_write_iter+0x196/0x3d0 [ext4]
[<ffffffff8e40365b>] ? __vfs_write+0xeb/0x160
[<ffffffff8e4047f3>] ? vfs_write+0xb3/0x1a0
[<ffffffff8e405bf2>] ? SyS_write+0x52/0xc0
[<ffffffff8e7f2ab6>] ? entry_SYSCALL_64_fastpath+0x1e/0xa8

But there was IO both on the filesystem and the journal (the devices of the filesystem is slower then the device the journal sits on).


Regards,
--
Wolfgang Walter
Studentenwerk M?nchen
Anstalt des ?ffentlichen Rechts

2017-09-06 12:46:48

by Wolfgang Walter

[permalink] [raw]
Subject: Re: JBD2: Spotted dirty metadata buffer....

Am Montag, 28. November 2016, 12:26:38 schrieb Wolfgang Walter:
> Am Mittwoch, 23. November 2016, 16:40:07 schrieb Andreas Dilger:
> > On Nov 23, 2016, at 3:43 AM, Wolfgang Walter <[email protected]> wrote:
> > > Am Dienstag, 22. November 2016, 16:02:53 schrieben Sie:
> > >> On Nov 22, 2016, at 6:56 AM, Wolfgang Walter <[email protected]> wrote:
> > >>> Am Montag, 21. November 2016, 17:49:36 schrieben Sie:
> > >>>> On Nov 21, 2016, at 8:28 AM, Wolfgang Walter <[email protected]> wrote:
> > >>>>> Hello,
>
> [snip]
>
> > Stepping back a bit - does this problem only happen with an external
> > journal device, or does it also happen with an internal journal?
> >
>
> So I tried that this weekend. I got again these messages
>
> JBD2: Spotted dirty metadata buffer (dev = dm-22, blocknr = 241763277). There's a risk of filesystem corruption in case of system crash.
>
> So this also happens with an internal journal.
>

[snip]

I last tried with 4.9.46 and I still see that problem when rsyncing data to the filesystem: errors similar to

JBD2: Spotted dirty metadata buffer (dev = dm-25, blocknr = 1008028301). There's a risk of filesystem corruption in case of system

A later filesystem check does not show any errors.


With 4.9.46 stable kernels I also sometimes get the following error:

EXT4-fs error (device dm-25): ext4_iget:4501: inode #74061557: comm rsync: checksum invalid
or
EXT4-fs error (device dm-25): ext4_iget:4501: inode #155844677: comm nfsd: checksum invalid

A filesystem check then says that the inode itselfs seems ok but the checksum is indeed wrong.

As these inodes are inodes of very small files.

So I finally copied all away and reinitialized the filesystem. But this time without -O inline_data

Since then all works fine. So I assume there is a proplem with inodes and inline-data (at least until 4.9.46), maybe only with data=journal.

Regards,
--
Wolfgang Walter
Studentenwerk M?nchen
Anstalt des ?ffentlichen Rechts

2017-09-06 18:07:11

by Andreas Dilger

[permalink] [raw]
Subject: Re: JBD2: Spotted dirty metadata buffer....

On Sep 6, 2017, at 6:46 AM, Wolfgang Walter <[email protected]> wrote:
>
> Am Montag, 28. November 2016, 12:26:38 schrieb Wolfgang Walter:
>> Am Mittwoch, 23. November 2016, 16:40:07 schrieb Andreas Dilger:
>>> Stepping back a bit - does this problem only happen with an external
>>> journal device, or does it also happen with an internal journal?
>>
>> So I tried that this weekend. I got again these messages
>>
>> JBD2: Spotted dirty metadata buffer (dev = dm-22, blocknr = 241763277). There's a risk of filesystem corruption in case of system crash.
>>
>> So this also happens with an internal journal.
>>
>
> [snip]
>
> I last tried with 4.9.46 and I still see that problem when rsyncing data to the filesystem: errors similar to
>
> JBD2: Spotted dirty metadata buffer (dev = dm-25, blocknr = 1008028301). There's a risk of filesystem corruption in case of system crash.
>
> A later filesystem check does not show any errors.
>
>
> With 4.9.46 stable kernels I also sometimes get the following error:
>
> EXT4-fs error (device dm-25): ext4_iget:4501: inode #74061557: comm rsync: checksum invalid
> or
> EXT4-fs error (device dm-25): ext4_iget:4501: inode #155844677: comm nfsd: checksum invalid
>
> A filesystem check then says that the inode itselfs seems ok but the checksum is indeed wrong.
>
> As these inodes are inodes of very small files.
>
> So I finally copied all away and reinitialized the filesystem. But this time without -O inline_data
>
> Since then all works fine. So I assume there is a proplem with inodes and inline-data (at least until 4.9.46), maybe only with data=journal.

Add Tao Ma, since he is the original inline data author, and it appears that Taobao
is using this feature so are probably interested in fixing any corruption.

Wolfgang, it would be useful if you could figure out what type of block the reported
blocknr=1008028301 is. You can use "debugfs -c -R 'icheck 1008028301' /dev/dm-25".
I'd suspect, given that this is an inline data problem, that this is an inode table
block, but it is good to be sure. It would also be useful to see if this inode
number correlates to one of the inodes that have bad checksums.

Cheers, Andreas






Attachments:
signature.asc (195.00 B)
Message signed with OpenPGP