From: Zhang Yi <[email protected]>
We got a NULL pointer dereference issue below while running generic/475
I/O failure pressure test.
BUG: kernel NULL pointer dereference, address: 0000000000000000
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 0 P4D 0
Oops: 0002 [#1] PREEMPT SMP PTI
CPU: 1 PID: 15600 Comm: fsstress Not tainted 6.4.0-rc5-xfstests-00055-gd3ab1bca26b4 #190
RIP: 0010:jbd2_journal_set_features+0x13d/0x430
...
Call Trace:
<TASK>
? __die+0x23/0x60
? page_fault_oops+0xa4/0x170
? exc_page_fault+0x67/0x170
? asm_exc_page_fault+0x26/0x30
? jbd2_journal_set_features+0x13d/0x430
jbd2_journal_revoke+0x47/0x1e0
__ext4_forget+0xc3/0x1b0
ext4_free_blocks+0x214/0x2f0
ext4_free_branches+0xeb/0x270
ext4_ind_truncate+0x2bf/0x320
ext4_truncate+0x1e4/0x490
ext4_handle_inode_extension+0x1bd/0x2a0
? iomap_dio_complete+0xaf/0x1d0
The root cause is the journal super block had been failed to write out
due to I/O fault injection, it's uptodate bit was cleared by
end_buffer_write_sync() and didn't reset yet in jbd2_write_superblock().
And it raced by journal_get_superblock()->bh_read(), unfortunately, the
read IO is also failed, so the error handling in
journal_fail_superblock() unexpectedly clear the journal->j_sb_buffer,
finally lead to above NULL pointer dereference issue.
If the journal super block had been read and verified, there is no need
to call bh_read() read it again even if it has been failed to written
out. So the fix could be simply move buffer_verified(bh) in front of
bh_read(). Also remove a stale comment left in
jbd2_journal_check_used_features().
Fixes: 51bacdba23d8 ("jbd2: factor out journal initialization from journal_get_superblock()")
Reported-by: Theodore Ts'o <[email protected]>
Signed-off-by: Zhang Yi <[email protected]>
---
v1->v2:
- Remove a stale comment left in jbd2_journal_check_used_features().
fs/jbd2/journal.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
index b5e57735ab3f..559242df0f9a 100644
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -1919,6 +1919,9 @@ static int journal_get_superblock(journal_t *journal)
bh = journal->j_sb_buffer;
J_ASSERT(bh != NULL);
+ if (buffer_verified(bh))
+ return 0;
+
err = bh_read(bh, 0);
if (err < 0) {
printk(KERN_ERR
@@ -1926,9 +1929,6 @@ static int journal_get_superblock(journal_t *journal)
goto out;
}
- if (buffer_verified(bh))
- return 0;
-
sb = journal->j_superblock;
err = -EINVAL;
@@ -2229,7 +2229,6 @@ int jbd2_journal_check_used_features(journal_t *journal, unsigned long compat,
if (!compat && !ro && !incompat)
return 1;
- /* Load journal superblock if it is not loaded yet. */
if (journal_get_superblock(journal))
return 0;
if (!jbd2_format_support_feature(journal))
--
2.39.2
On Fri 16-06-23 09:55:47, Zhang Yi wrote:
> From: Zhang Yi <[email protected]>
>
> We got a NULL pointer dereference issue below while running generic/475
> I/O failure pressure test.
>
> BUG: kernel NULL pointer dereference, address: 0000000000000000
> #PF: supervisor write access in kernel mode
> #PF: error_code(0x0002) - not-present page
> PGD 0 P4D 0
> Oops: 0002 [#1] PREEMPT SMP PTI
> CPU: 1 PID: 15600 Comm: fsstress Not tainted 6.4.0-rc5-xfstests-00055-gd3ab1bca26b4 #190
> RIP: 0010:jbd2_journal_set_features+0x13d/0x430
> ...
> Call Trace:
> <TASK>
> ? __die+0x23/0x60
> ? page_fault_oops+0xa4/0x170
> ? exc_page_fault+0x67/0x170
> ? asm_exc_page_fault+0x26/0x30
> ? jbd2_journal_set_features+0x13d/0x430
> jbd2_journal_revoke+0x47/0x1e0
> __ext4_forget+0xc3/0x1b0
> ext4_free_blocks+0x214/0x2f0
> ext4_free_branches+0xeb/0x270
> ext4_ind_truncate+0x2bf/0x320
> ext4_truncate+0x1e4/0x490
> ext4_handle_inode_extension+0x1bd/0x2a0
> ? iomap_dio_complete+0xaf/0x1d0
>
> The root cause is the journal super block had been failed to write out
> due to I/O fault injection, it's uptodate bit was cleared by
> end_buffer_write_sync() and didn't reset yet in jbd2_write_superblock().
> And it raced by journal_get_superblock()->bh_read(), unfortunately, the
> read IO is also failed, so the error handling in
> journal_fail_superblock() unexpectedly clear the journal->j_sb_buffer,
> finally lead to above NULL pointer dereference issue.
>
> If the journal super block had been read and verified, there is no need
> to call bh_read() read it again even if it has been failed to written
> out. So the fix could be simply move buffer_verified(bh) in front of
> bh_read(). Also remove a stale comment left in
> jbd2_journal_check_used_features().
>
> Fixes: 51bacdba23d8 ("jbd2: factor out journal initialization from journal_get_superblock()")
> Reported-by: Theodore Ts'o <[email protected]>
> Signed-off-by: Zhang Yi <[email protected]>
This works as a workaround. It is a bit kludgy but for now I guess it is
good enough. Thanks for the fix and feel free to add:
Reviewed-by: Jan Kara <[email protected]>
Honza
> ---
> v1->v2:
> - Remove a stale comment left in jbd2_journal_check_used_features().
>
> fs/jbd2/journal.c | 7 +++----
> 1 file changed, 3 insertions(+), 4 deletions(-)
>
> diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
> index b5e57735ab3f..559242df0f9a 100644
> --- a/fs/jbd2/journal.c
> +++ b/fs/jbd2/journal.c
> @@ -1919,6 +1919,9 @@ static int journal_get_superblock(journal_t *journal)
> bh = journal->j_sb_buffer;
>
> J_ASSERT(bh != NULL);
> + if (buffer_verified(bh))
> + return 0;
> +
> err = bh_read(bh, 0);
> if (err < 0) {
> printk(KERN_ERR
> @@ -1926,9 +1929,6 @@ static int journal_get_superblock(journal_t *journal)
> goto out;
> }
>
> - if (buffer_verified(bh))
> - return 0;
> -
> sb = journal->j_superblock;
>
> err = -EINVAL;
> @@ -2229,7 +2229,6 @@ int jbd2_journal_check_used_features(journal_t *journal, unsigned long compat,
>
> if (!compat && !ro && !incompat)
> return 1;
> - /* Load journal superblock if it is not loaded yet. */
> if (journal_get_superblock(journal))
> return 0;
> if (!jbd2_format_support_feature(journal))
> --
> 2.39.2
>
--
Jan Kara <[email protected]>
SUSE Labs, CR
On 2023/6/16 21:27, Jan Kara wrote:
> On Fri 16-06-23 09:55:47, Zhang Yi wrote:
>> From: Zhang Yi <[email protected]>
>>
>> We got a NULL pointer dereference issue below while running generic/475
>> I/O failure pressure test.
>>
>> BUG: kernel NULL pointer dereference, address: 0000000000000000
>> #PF: supervisor write access in kernel mode
>> #PF: error_code(0x0002) - not-present page
>> PGD 0 P4D 0
>> Oops: 0002 [#1] PREEMPT SMP PTI
>> CPU: 1 PID: 15600 Comm: fsstress Not tainted 6.4.0-rc5-xfstests-00055-gd3ab1bca26b4 #190
>> RIP: 0010:jbd2_journal_set_features+0x13d/0x430
>> ...
>> Call Trace:
>> <TASK>
>> ? __die+0x23/0x60
>> ? page_fault_oops+0xa4/0x170
>> ? exc_page_fault+0x67/0x170
>> ? asm_exc_page_fault+0x26/0x30
>> ? jbd2_journal_set_features+0x13d/0x430
>> jbd2_journal_revoke+0x47/0x1e0
>> __ext4_forget+0xc3/0x1b0
>> ext4_free_blocks+0x214/0x2f0
>> ext4_free_branches+0xeb/0x270
>> ext4_ind_truncate+0x2bf/0x320
>> ext4_truncate+0x1e4/0x490
>> ext4_handle_inode_extension+0x1bd/0x2a0
>> ? iomap_dio_complete+0xaf/0x1d0
>>
>> The root cause is the journal super block had been failed to write out
>> due to I/O fault injection, it's uptodate bit was cleared by
>> end_buffer_write_sync() and didn't reset yet in jbd2_write_superblock().
>> And it raced by journal_get_superblock()->bh_read(), unfortunately, the
>> read IO is also failed, so the error handling in
>> journal_fail_superblock() unexpectedly clear the journal->j_sb_buffer,
>> finally lead to above NULL pointer dereference issue.
>>
>> If the journal super block had been read and verified, there is no need
>> to call bh_read() read it again even if it has been failed to written
>> out. So the fix could be simply move buffer_verified(bh) in front of
>> bh_read(). Also remove a stale comment left in
>> jbd2_journal_check_used_features().
>>
>> Fixes: 51bacdba23d8 ("jbd2: factor out journal initialization from journal_get_superblock()")
>> Reported-by: Theodore Ts'o <[email protected]>
>> Signed-off-by: Zhang Yi <[email protected]>
>
> This works as a workaround. It is a bit kludgy but for now I guess it is
> good enough. Thanks for the fix and feel free to add:
>
Thanks for the review. Yes, I suppose it's better to find a way to adjust
the sequence of journal load and feature checking in ocfs2_check_volume(),
so that we could completely remove the journal_get_superblock() in
jbd2_journal_check_used_features().
Thanks,
Yi.
>
>
>> ---
>> v1->v2:
>> - Remove a stale comment left in jbd2_journal_check_used_features().
>>
>> fs/jbd2/journal.c | 7 +++----
>> 1 file changed, 3 insertions(+), 4 deletions(-)
>>
>> diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
>> index b5e57735ab3f..559242df0f9a 100644
>> --- a/fs/jbd2/journal.c
>> +++ b/fs/jbd2/journal.c
>> @@ -1919,6 +1919,9 @@ static int journal_get_superblock(journal_t *journal)
>> bh = journal->j_sb_buffer;
>>
>> J_ASSERT(bh != NULL);
>> + if (buffer_verified(bh))
>> + return 0;
>> +
>> err = bh_read(bh, 0);
>> if (err < 0) {
>> printk(KERN_ERR
>> @@ -1926,9 +1929,6 @@ static int journal_get_superblock(journal_t *journal)
>> goto out;
>> }
>>
>> - if (buffer_verified(bh))
>> - return 0;
>> -
>> sb = journal->j_superblock;
>>
>> err = -EINVAL;
>> @@ -2229,7 +2229,6 @@ int jbd2_journal_check_used_features(journal_t *journal, unsigned long compat,
>>
>> if (!compat && !ro && !incompat)
>> return 1;
>> - /* Load journal superblock if it is not loaded yet. */
>> if (journal_get_superblock(journal))
>> return 0;
>> if (!jbd2_format_support_feature(journal))
>> --
>> 2.39.2
>>
On Sat, Jun 17, 2023 at 10:42:59AM +0800, Zhang Yi wrote:
> > This works as a workaround. It is a bit kludgy but for now I guess it is
> > good enough. Thanks for the fix and feel free to add:
>
> Thanks for the review. Yes, I suppose it's better to find a way to adjust
> the sequence of journal load and feature checking in ocfs2_check_volume(),
> so that we could completely remove the journal_get_superblock() in
> jbd2_journal_check_used_features().
Indeed, thanks for the fix.
This is would be for after the merge window, but I think we can clean
this up in the jbd2 layer by simply moving the call to
load_superblock() from jbd2_journal_load() and jbd2_journal_wipe() to
journal_init_common(). This change would mean the journal superblock
gets read as part of the call to jbd2_journal_init_{dev,inode}.
That way, once the file system has a journal_t object, it's guaranteed
that the j_sb_buffer contains valid data, and so we can drop the call
to journal_get_superblock() from jbd2_journal_check_used_features().
And after we do that, we should be able to inline the code in
load_superblock() and journal_get_superblock() into
journal_init_common(), which would simplify things in
jfs/jbd2/journal.c
Finally, so we can provide better error handling, we could change
Jbd2_journal_init_{dev,inode} to return an ERR_PTR instead of a NULL
if there is a failure. And since it's a good idea to change the
function name when changing the function signature, we could rename
those functions to something like jbd2_open_{dev,inode} at the same
time.
- Ted
P.S. The only reason why we don't load the superblock in
jbd2_journal_init_{dev,common} was that back in 2001, it was possible
to create the journal by creating a zero length file in the file
system, noting the inode number of the file system, unmounting the
file system from ext2, and then remounting it with "mount -t ext3 -o
journal=NNN ...". In order to do this, the ext3 file system code
called journal_init_inode() with the inode, and then follow it up with
a call to journal_create(), which would actually write out the journal
superblock. For that reason, journal_init_inode() had to avoid
reading the journal superblock, since it might not be initialized yet.
We removed jbd2_journal_create() from fs/jbd2 back in 2009, and it
hadn't been in use for quite a while before that --- in fact, I'm not
sure ext4 ever supported this ext3-style "let's create a journal
without e2fsprogs support because Stephen Tweedie was implementing the
ext3 journal kernel code without wanting to make changes to e2fsprogs
first" feature. :-)
On 2023/6/18 2:50, Theodore Ts'o wrote:
> On Sat, Jun 17, 2023 at 10:42:59AM +0800, Zhang Yi wrote:
>>> This works as a workaround. It is a bit kludgy but for now I guess it is
>>> good enough. Thanks for the fix and feel free to add:
>>
>> Thanks for the review. Yes, I suppose it's better to find a way to adjust
>> the sequence of journal load and feature checking in ocfs2_check_volume(),
>> so that we could completely remove the journal_get_superblock() in
>> jbd2_journal_check_used_features().
>
> Indeed, thanks for the fix.
>
> This is would be for after the merge window, but I think we can clean
> this up in the jbd2 layer by simply moving the call to
> load_superblock() from jbd2_journal_load() and jbd2_journal_wipe() to
> journal_init_common(). This change would mean the journal superblock
> gets read as part of the call to jbd2_journal_init_{dev,inode}.
>
> That way, once the file system has a journal_t object, it's guaranteed
> that the j_sb_buffer contains valid data, and so we can drop the call
> to journal_get_superblock() from jbd2_journal_check_used_features().
>
> And after we do that, we should be able to inline the code in
> load_superblock() and journal_get_superblock() into
> journal_init_common(), which would simplify things in
> jfs/jbd2/journal.c
>
> Finally, so we can provide better error handling, we could change
> Jbd2_journal_init_{dev,inode} to return an ERR_PTR instead of a NULL
> if there is a failure. And since it's a good idea to change the
> function name when changing the function signature, we could rename
> those functions to something like jbd2_open_{dev,inode} at the same
> time.
>
> - Ted
>
> P.S. The only reason why we don't load the superblock in
> jbd2_journal_init_{dev,common} was that back in 2001, it was possible
> to create the journal by creating a zero length file in the file
> system, noting the inode number of the file system, unmounting the
> file system from ext2, and then remounting it with "mount -t ext3 -o
> journal=NNN ...". In order to do this, the ext3 file system code
> called journal_init_inode() with the inode, and then follow it up with
> a call to journal_create(), which would actually write out the journal
> superblock. For that reason, journal_init_inode() had to avoid
> reading the journal superblock, since it might not be initialized yet.
>
> We removed jbd2_journal_create() from fs/jbd2 back in 2009, and it
> hadn't been in use for quite a while before that --- in fact, I'm not
> sure ext4 ever supported this ext3-style "let's create a journal
> without e2fsprogs support because Stephen Tweedie was implementing the
> ext3 journal kernel code without wanting to make changes to e2fsprogs
> first" feature. :-)
>
Thanks for the suggestion and historical details, we could do the cleanup
in jbd2 layer after the merge window.
Thanks,
Yi.