LinuxLists.cc - Null pointer dereference of s_chksum_driver in ext4

2015-05-01 19:49:06

Subject: Null pointer dereference of s_chksum_driver in ext4_chksum

Hi

I am running the 3.10 ( android ) kernel.

I have run into a couple of instances of a null pointer dereference
occurring in the function ext4_chksum.

This issue seems to be the same one as
https://bugzilla.kernel.org/show_bug.cgi?id=82201

I am not sure if this was ever solved?

Can someone kindly point me in the right direction?

The only patch i found that might be remotely related is the
https://www.codeaurora.org/cgit/quic/la/kernel/msm-3.14/commit/?h=LA.HB.1.1.1_rb1.10&id=9cf666834cffdb450b9b18f3e06c30493cb40ed2

I am not entirely sure if this is the fix for the issue.

Please find additional details below:

This occurred in while dereferencing the sbi->s_chksum_driver member of
the superblock info.

This occurs during a bootup mount

10.216919: <6> EXT4-fs (mmcblk0p22): mounted filesystem with ordered
data mode. Opts: barrier=1,discard
10.225032: <6> SELinux: initialized (dev mmcblk0p22, type ext4),
uses xattr
10.235901: <6> EXT4-fs (mmcblk0p29): Ignoring removed
nomblk_io_submit option
10.341141: <6> Unable to handle kernel NULL pointer dereference
at virtual address 00000000

The call stack is as below:

[<ffffffc000393a54>] ext4_superblock_csum+0x20/0x68
10.498103: <2>[<ffffffc000393fc8>]ext4_superblock_csum_set+0x20/0x34
10.504353: <2> [<ffffffc00039455c>] ext4_commit_super+0x178/0x1f4
10.510170: <2> [<ffffffc0003945f4>] save_error_info+0x1c/0x2c
10.515638: <2> [<ffffffc000394954>] ext4_error_inode+0x4c/0x13c
10.521282: <2> [<ffffffc00037d510>] ext4_map_blocks+0x354/0x398
10.526924: <2> [<ffffffc00037e97c>] _ext4_get_block+0xc0/0x160
10.532479: <2> [<ffffffc00037ea2c>] ext4_get_block+0x10/0x1c
10.537863: <2> [<ffffffc00031e808>] generic_block_bmap+0x34/0x44
10.543589: <2> [<ffffffc00037b980>] ext4_bmap+0x78/0xd4
10.548539: <2> [<ffffffc00030a2ec>] bmap+0x20/0x2c
10.553052: <2> [<ffffffc0003c8ec0>] jbd2_journal_bmap+0x24/0x9c
10.558695: <2> [<ffffffc0003c311c>] jread+0x54/0x228
10.563381: <2> [<ffffffc0003c3618>] do_one_pass+0x328/0x724
10.568678: <2> [<ffffffc0003c3a8c>] jbd2_journal_recover+0x78/0xdc
10.574580: <2> [<ffffffc0003c8c80>] jbd2_journal_load+0x154/0x308
10.580396: <2> [<ffffffc000398168>] ext4_fill_super+0x1984/0x2470
10.586211: <2> [<ffffffc0002f8634>] mount_bdev+0x134/0x1b8
10.591420: <2> [<ffffffc000392f18>] ext4_mount+0x10/0x1c
10.596454: <2> [<ffffffc0002f8ebc>] mount_fs+0x78/0x174
10.601404: <2> [<ffffffc00030f420>] vfs_kern_mount+0x58/0xcc
10.606785: <2> [<ffffffc000311748>] do_mount+0x6f0/0x7d4
10.611819: <2> [<ffffffc0003118b8>] SyS_mount+0x8c/0xd0
10.616768: <6> Code: 9100fff3 f9420000 927ae673 f942340(b9400002)
10.622935: <6> ---[ end trace 69fa2927148e4ec2 ]---
10.627528: <6> Kernel panic - not syncing: Fatal exception

--
Thanks
Nikhilesh Reddy

Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

2015-05-03 20:12:36

by Nikhilesh Reddy

[permalink] [raw]

Subject: Null pointer dereference of s_chksum_driver in ext4_chksum

Hi

I am running the 3.10 ( android ) kernel.

I have run into a couple of instances of a null pointer dereference
occurring in the function ext4_chksum.

This issue seems to be the same one as
https://bugzilla.kernel.org/show_bug.cgi?id=82201

I am not sure if this was ever solved?

Can someone kindly point me in the right direction?

The only patch i found that might be remotely related is the
https://www.codeaurora.org/cgit/quic/la/kernel/msm-3.14/commit/?h=LA.HB.1.1.1_rb1.10&id=9cf666834cffdb450b9b18f3e06c30493cb40ed2

I am not entirely sure if this is the fix for the issue.

Please find additional details below:

This occurred in while de-referencing the sbi->s_chksum_driver member of
the superblock info.

This occurs during a bootup mount

10.216919: <6> EXT4-fs (mmcblk0p22): mounted filesystem with ordered
data mode. Opts: barrier=1,discard
10.225032: <6> SELinux: initialized (dev mmcblk0p22, type ext4),
uses xattr
10.235901: <6> EXT4-fs (mmcblk0p29): Ignoring removed
nomblk_io_submit option
10.341141: <6> Unable to handle kernel NULL pointer dereference
at virtual address 00000000

The call stack is as below:

[<ffffffc000393a54>] ext4_superblock_csum+0x20/0x68
10.498103: <2>[<ffffffc000393fc8>]ext4_superblock_csum_set+0x20/0x34
10.504353: <2> [<ffffffc00039455c>] ext4_commit_super+0x178/0x1f4
10.510170: <2> [<ffffffc0003945f4>] save_error_info+0x1c/0x2c
10.515638: <2> [<ffffffc000394954>] ext4_error_inode+0x4c/0x13c
10.521282: <2> [<ffffffc00037d510>] ext4_map_blocks+0x354/0x398
10.526924: <2> [<ffffffc00037e97c>] _ext4_get_block+0xc0/0x160
10.532479: <2> [<ffffffc00037ea2c>] ext4_get_block+0x10/0x1c
10.537863: <2> [<ffffffc00031e808>] generic_block_bmap+0x34/0x44
10.543589: <2> [<ffffffc00037b980>] ext4_bmap+0x78/0xd4
10.548539: <2> [<ffffffc00030a2ec>] bmap+0x20/0x2c
10.553052: <2> [<ffffffc0003c8ec0>] jbd2_journal_bmap+0x24/0x9c
10.558695: <2> [<ffffffc0003c311c>] jread+0x54/0x228
10.563381: <2> [<ffffffc0003c3618>] do_one_pass+0x328/0x724
10.568678: <2> [<ffffffc0003c3a8c>] jbd2_journal_recover+0x78/0xdc
10.574580: <2> [<ffffffc0003c8c80>] jbd2_journal_load+0x154/0x308
10.580396: <2> [<ffffffc000398168>] ext4_fill_super+0x1984/0x2470
10.586211: <2> [<ffffffc0002f8634>] mount_bdev+0x134/0x1b8
10.591420: <2> [<ffffffc000392f18>] ext4_mount+0x10/0x1c
10.596454: <2> [<ffffffc0002f8ebc>] mount_fs+0x78/0x174
10.601404: <2> [<ffffffc00030f420>] vfs_kern_mount+0x58/0xcc
10.606785: <2> [<ffffffc000311748>] do_mount+0x6f0/0x7d4
10.611819: <2> [<ffffffc0003118b8>] SyS_mount+0x8c/0xd0
10.616768: <6> Code: 9100fff3 f9420000 927ae673 f942340(b9400002)
10.622935: <6> ---[ end trace 69fa2927148e4ec2 ]---
10.627528: <6> Kernel panic - not syncing: Fatal exception

--
Thanks
Nikhilesh Reddy

Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

2015-05-03 21:37:02

by Theodore Ts'o

[permalink] [raw]

Subject: Re: Null pointer dereference of s_chksum_driver in ext4_chksum

On Sun, May 03, 2015 at 01:11:58PM -0700, Nikhilesh Reddy wrote:
> Please find additional details below:
>
> This occurred in while de-referencing the sbi->s_chksum_driver member of
> the superblock info.
>
> This occurs during a bootup mount
>
> 10.216919: <6> EXT4-fs (mmcblk0p22): mounted filesystem with ordered
> data mode. Opts: barrier=1,discard
> 10.225032: <6> SELinux: initialized (dev mmcblk0p22, type ext4),
> uses xattr
> 10.235901: <6> EXT4-fs (mmcblk0p29): Ignoring removed
> nomblk_io_submit option
> 10.341141: <6> Unable to handle kernel NULL pointer dereference
> at virtual address 00000000

I'd have to actually see the full file system to understand what is
going on, but what I suspect is happening is that the file system has
been corrupted in at least two different ways. The first is that
there the journal inode is corrupted; this is what's causing the call
to ext4_error_inode() from a call to jbd2_journal_bmap().

The *second* thing which is going on is that before we noticed the
corrupted journal inode, the journal contained a copy of the
superblock which we replayed that _set_ the metadata checksum feature
flag. Since it wasn't set originally when file system was initially
mounted, s_chksum_driver wasn't initialized, and this cuases the NULL
pointer deference.

Avoiding the kernel crash was fixed by accident in 3.18 with the
following commit: 9aa5d32ba269 ("Replace open coded mdata csum feature
to helper function"), since instead of actually checking to see if the
metadata checksum field is set, it uses as its primary mechanism
checking to see if s_chksum_driver is non-NULL. There is a
WARN_ON_ONCE that will trip in the situation where the feature flag is
set and s_chksum_driver is NULL, but that really is a "should never
happen" situation. The only scenario I can think of where this might
have happened is the one I described above, where it was enabled by a
journal replay.

This should be sufficient to avoid the crash, but I haven't had the
chance to try creating a file system corrupted the way I conjecture it
was corrupted, and see whether it we correctly fail the mount (which
is clearly what should happen if we discover a corrupted journal inode
while replaying the journal during the mount.)

- Ted

2015-05-04 19:35:51

by Nikhilesh Reddy

[permalink] [raw]

Subject: Re: Null pointer dereference of s_chksum_driver in ext4_chksum

Thank you so much Ted.
Will try to identify the root cause of the journal inode corruption.

And thanks for pointing me to the commit: 9aa5d32ba269 .

--
Thanks
Nikhilesh Reddy

Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora
Forum,
a Linux Foundation Collaborative Project.