2021-08-25 02:13:31

by yangerkun

[permalink] [raw]
Subject: [QUESTION] question for commit 2d01ddc86606 ("ext4: save error info to sb through journal if available")

Hi Jan,

There is a question about 2d01ddc86606 ("ext4: save error info to sb
through journal if available"). This commit describe that we can have
checksum failure with follow case:

1. ext4_handle_error will call ext4_commit_super which write directly to
the superblock
2. At the same time, jounalled update of the superblock is ongoing

However, after commit 05c2c00f3769 ("ext4: protect superblock
modifications with a buffer lock"), all the update for superblock and
the csum will be protected with buffer lock. It seems we won't get a
csum error after that commit and the journal logic in
flush_stashed_error_work seems useless.

Maybe there is something missing... Can you help to explain more for that...


Thanks,
Kun.


2021-08-25 10:26:23

by Jan Kara

[permalink] [raw]
Subject: Re: [QUESTION] question for commit 2d01ddc86606 ("ext4: save error info to sb through journal if available")


Hello Kun!

On Wed 25-08-21 10:13:03, yangerkun wrote:
> There is a question about 2d01ddc86606 ("ext4: save error info to sb through
> journal if available"). This commit describe that we can have checksum
> failure with follow case:
>
> 1. ext4_handle_error will call ext4_commit_super which write directly to the
> superblock
> 2. At the same time, jounalled update of the superblock is ongoing
>
> However, after commit 05c2c00f3769 ("ext4: protect superblock modifications
> with a buffer lock"), all the update for superblock and the csum will be
> protected with buffer lock. It seems we won't get a csum error after that
> commit and the journal logic in flush_stashed_error_work seems useless.
>
> Maybe there is something missing... Can you help to explain more for that...

You are correct that after commit 05c2c00f3769 the checksum will be
correct. However there are also other problems that 2d01ddc86606 addresses
and that are mentioned in the commit description like "writing inconsistent
information". The fundamental problem is that you cannot mix journalled and
non-journalled updates to any block. Because e.g. the unjournalled update
could store to disk information that was changed only as part of the
currently running transaction and if the machine crashes before the
transaction commits, we have too new information in the block and thus
inconsistent filesystem. Or in the other direction, journal replay can
overwrite unjournalled modifications to the superblock if we crash at the
right moment.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2021-08-26 01:07:55

by yangerkun

[permalink] [raw]
Subject: Re: [QUESTION] question for commit 2d01ddc86606 ("ext4: save error info to sb through journal if available")



在 2021/8/25 18:25, Jan Kara 写道:
>
> Hello Kun!
>
> On Wed 25-08-21 10:13:03, yangerkun wrote:
>> There is a question about 2d01ddc86606 ("ext4: save error info to sb through
>> journal if available"). This commit describe that we can have checksum
>> failure with follow case:
>>
>> 1. ext4_handle_error will call ext4_commit_super which write directly to the
>> superblock
>> 2. At the same time, jounalled update of the superblock is ongoing
>>
>> However, after commit 05c2c00f3769 ("ext4: protect superblock modifications
>> with a buffer lock"), all the update for superblock and the csum will be
>> protected with buffer lock. It seems we won't get a csum error after that
>> commit and the journal logic in flush_stashed_error_work seems useless.
>>
>> Maybe there is something missing... Can you help to explain more for that...
>
> You are correct that after commit 05c2c00f3769 the checksum will be
> correct. However there are also other problems that 2d01ddc86606 addresses
> and that are mentioned in the commit description like "writing inconsistent
> information". The fundamental problem is that you cannot mix journalled and
> non-journalled updates to any block. Because e.g. the unjournalled update
> could store to disk information that was changed only as part of the
> currently running transaction and if the machine crashes before the
> transaction commits, we have too new information in the block and thus
> inconsistent filesystem. Or in the other direction, journal replay can
> overwrite unjournalled modifications to the superblock if we crash at the
> right moment.

Got it! Thanks for your explain!

>
> Honza
>