Hi.
Last night I had a kernel oops in fsnotify (see attached). It happened
when I clicked a hyperlink to a text file in Firefox (which normally
opens an open-with/save-file dialog).
I've not found a way to reproduce.
The system runs kernel 3.5.2 with ext3. (note: kernel is patched with
overlayfs-v13).
Many thanks for your assistance.
~ Andy
==
On Sun, Aug 26, 2012 at 03:44:54PM -0500, Andrew Watts wrote:
> BUG: unable to handle kernel NULL pointer dereference at 00000064
> IP: [<c1109b7d>] fsnotify+0x8b/0x270
> *pde = 00000000
> Oops: 0000 [#1]
> Pid: 14083, comm: firefox Tainted: G O 3.5.2
> EIP: 0060:[<c1109b7d>] EFLAGS: 00210246 CPU: 0
> EIP is at fsnotify+0x8b/0x270
> EAX: 00000000 EBX: fffffff0 ECX: f5988910 EDX: f5988910
> ESI: 00000010 EDI: 00000000 EBP: dea1de5c ESP: dea1de14
> DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068
> CR0: 80050033 CR2: 00000064 CR3: 34c52000 CR4: 000007d0
> DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> DR6: ffff0ff0 DR7: 00000400
> Process firefox (pid: 14083, ti=dea1c000 task=ed2d1880 task.ti=dea1c000)
> Stack:
> dea1de34 c10ee498 00000000 dea1deec dea1df78 00008000 00000001 c10f21c3
> eeb43688 f5988910 00000010 dea1de48 00000000 00000000 00000000 eeb43680
> 00000010 f6003600 dea1de8c c10de255 00000001 00000000 00000000 00000000
> Call Trace:
> [<c10ee498>] ? dput+0x156/0x1c5
> [<c10f21c3>] ? mntput+0x19/0x28
> [<c10de255>] fput+0x196/0x1ed
> [<c10e5435>] release_open_intent+0x1d/0x29
> [<c10e8b03>] path_openat+0xc5/0x33f
> [<c10e8e41>] do_filp_open+0x2a/0x79
> [<c10f105e>] ? alloc_fd+0x5c/0xcb
> [<c10e538a>] ? getname_flags+0x31/0xb1
> [<c10dc553>] do_sys_open+0xef/0x1da
> [<c10dc692>] sys_open+0x27/0x2f
> [<c1573f13>] sysenter_do_call+0x12/0x22
> [<c1560000>] ? netlbl_mgmt_add_common+0x1ec/0x306
> Code: 02 00 00 b8 20 82 93 c1 e8 41 35 f4 ff 89 45 d0 8b 4d dc 85 b1 24 01 00 00 0f 85 2b 01 00 00 85 db 0f 84 37 01 00 00 85 ff 75 09 <85> 73 74 0f 84 2a 01 00 00 8b 43 70 89 45 ec 8b 4d dc 8b 91 28
> EIP: [<c1109b7d>] fsnotify+0x8b/0x270 SS:ESP 0068:dea1de14
> CR2: 0000000000000064
> ---[ end trace b9a1d764aab1963e ]---
Problematic instruction seems to be this one:
85 73 74 test %esi,0x74(%ebx)
And correspond to indicated line in following code:
if (!(mask & FS_MODIFY) &&
!(test_mask & to_tell->i_fsnotify_mask) &&
* !(mnt && test_mask & mnt->mnt_fsnotify_mask))
return 0;
mnt (a 'struct mount*') is derived from a NULL 'struct vfsmount *',
thus got a value of 0xfffffff0, which is what's in ebx.
When reference ->mnt_fsnotify_mask (offset 0x74), it get
0xfffffff0 + 0x74 = 0x00000064, account for the fault address.
But have no idea how 'struct path' contained a NULL
'struct vfsmount *' ... ...
On Sun, Aug 26, 2012 at 03:44:54PM -0500, Andrew Watts wrote:
> Hi.
>
> The system runs kernel 3.5.2 with ext3. (note: kernel is patched with
> overlayfs-v13).
>
Thanks for pointing out that the kernel is patched with overlayfs. As
the crash is within the filesystem, it looks to be a bug with that
patch. You may want to report this bug to them.
-- Steve
> ==
> BUG: unable to handle kernel NULL pointer dereference at 00000064
> IP: [<c1109b7d>] fsnotify+0x8b/0x270
> *pde = 00000000
> Oops: 0000 [#1]
> Pid: 14083, comm: firefox Tainted: G O 3.5.2
> EIP: 0060:[<c1109b7d>] EFLAGS: 00210246 CPU: 0
> EIP is at fsnotify+0x8b/0x270
> EAX: 00000000 EBX: fffffff0 ECX: f5988910 EDX: f5988910
> ESI: 00000010 EDI: 00000000 EBP: dea1de5c ESP: dea1de14
> DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068
> CR0: 80050033 CR2: 00000064 CR3: 34c52000 CR4: 000007d0
> DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> DR6: ffff0ff0 DR7: 00000400
> Process firefox (pid: 14083, ti=dea1c000 task=ed2d1880 task.ti=dea1c000)
> Stack:
> dea1de34 c10ee498 00000000 dea1deec dea1df78 00008000 00000001 c10f21c3
> eeb43688 f5988910 00000010 dea1de48 00000000 00000000 00000000 eeb43680
> 00000010 f6003600 dea1de8c c10de255 00000001 00000000 00000000 00000000
> Call Trace:
> [<c10ee498>] ? dput+0x156/0x1c5
> [<c10f21c3>] ? mntput+0x19/0x28
> [<c10de255>] fput+0x196/0x1ed
> [<c10e5435>] release_open_intent+0x1d/0x29
> [<c10e8b03>] path_openat+0xc5/0x33f
> [<c10e8e41>] do_filp_open+0x2a/0x79
> [<c10f105e>] ? alloc_fd+0x5c/0xcb
> [<c10e538a>] ? getname_flags+0x31/0xb1
> [<c10dc553>] do_sys_open+0xef/0x1da
> [<c10dc692>] sys_open+0x27/0x2f
> [<c1573f13>] sysenter_do_call+0x12/0x22
> [<c1560000>] ? netlbl_mgmt_add_common+0x1ec/0x306
> Code: 02 00 00 b8 20 82 93 c1 e8 41 35 f4 ff 89 45 d0 8b 4d dc 85 b1 24 01 00 00 0f 85 2b 01 00 00 85 db 0f 84 37 01 00 00 85 ff 75 09 <85> 73 74 0f 84 2a 01 00 00 8b 43 70 89 45 ec 8b 4d dc 8b 91 28
> EIP: [<c1109b7d>] fsnotify+0x8b/0x270 SS:ESP 0068:dea1de14
> CR2: 0000000000000064
> ---[ end trace b9a1d764aab1963e ]---
On Mon, Aug 27, 2012 at 01:32:01PM +0800, Guo Chao wrote:
> Problematic instruction seems to be this one:
>
> 85 73 74 test %esi,0x74(%ebx)
Guo:
Thank you very much for the very detailed response on the meaning of the
oops - it was very educational for me. I did have one question, however.
How were you able to identify the relevant section in the source code?
> But have no idea how 'struct path' contained a NULL
> 'struct vfsmount *' ... ...
>
As Steven points out, this could be a bug in the overlayfs code. After
getting a 2nd oops last night, I rebuilt 3.5.2 w/o overlafs in order to
test that theory. Though I am unable to reproduce at will, 2 oopses in
60 hours suggests I should not have to wait too long for a verdict.
I will follow-up in a couple of days one way or the other.
~ Andy