Subject: Re: Fw: 2.6.17 oops, possibly ntfs/mmap related
From: Anton Altaparmakov <aia21@cam.ac.uk>
To: Jonathan Woithe <jwoithe@physics.adelaide.edu.au>
Cc: linux-kernel@vger.kernel.org, Andrew Morton <akpm@osdl.org>
In-Reply-To: <20060912205602.57568b2a.akpm@osdl.org>
References: <20060912205602.57568b2a.akpm@osdl.org>
Content-Type: text/plain
Organization: Computing Service, University of Cambridge, UK
Date: Thu, 21 Sep 2006 10:54:43 +0100
Message-Id: <1158832483.5958.7.camel@imp.csi.cam.ac.uk>
Mime-Version: 1.0
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 5188
Lines: 114

Hi,

On Tue, 2006-09-12 at 20:56 -0700, Andrew Morton wrote:

Andrew, thanks for forwarding me the message...

> Begin forwarded message:
> 
> Date: Wed, 13 Sep 2006 10:05:42 +0930 (CST)
> From: Jonathan Woithe <jwoithe@physics.adelaide.edu.au>
> To: linux-kernel@vger.kernel.org
> Cc: jwoithe@physics.adelaide.edu.au (Jonathan Woithe)
> Subject: 2.6.17 oops, possibly ntfs/mmap related
> 
> 
> We have a machine which is currently making heavy use of a usb hard disc
> formatted with ntfs.  There have been two occasions where the kernel has
> oopsed while this disc was being accessed heavily.  Before adding this HDD
> the machine in question was rock solid which leads me to think that it
> might be related to ntfs.  USB drives formatted with other filesystems do
> not appear to suffer from this problem.
> 
> Unfortunately bogofilter considers the oops reports as spam so I cannot post
> them to the list.  I have instead put the full text of my original post
> regarding this topic on the web at
> 
>   http://www.atrad.com.au/~jwoithe/kernel/oopses-20060913.txt
> 
> I'm happy to try things to narrow down the cause if it will help.
> 
> Please CC me on reply.

These were the oopses from above text url:

> The first oops caused the machine to totally lock up:
> 
>   BUG: unable to handle kernel paging request at virtual address e4004de0
>    printing eip:
>   c012de0c
>   *pde = 00000000
>   Oops: 0000 [#1]
>   Modules linked in: ntfs 8139too via_agp agpgart usb_storage ehci_hcd uhci_hcd usbcore
>   CPU:    0
>   EIP:    0060:[<c012de0c>]    Not tainted VLI
>   EFLAGS: 00010082   (2.6.17 #2) 
>   EIP is at find_get_page+0x11/0x22
>   eax: e4004de0   ebx: c02f01a8   ecx: e4004de0   edx: e4004de0
>   esi: 00000000   edi: 00000066   ebp: cfc20574   esp: c770bee8
>   ds: 007b   es: 007b   ss: 0068
>   Process sh (pid: 10467, threadinfo=c770a000 task=cfa495c0)
>   Stack: c012ea09 00000002 00000000 cfc204d8 cff882a4 cff88260 c770bf30 c5962544
>          c02f01a8 00000000 ceec10a0 080aef10 c01385d1 00000000 cfc20574 080aef10
>          c5962544 ceec10a0 00000002 c7aeb080 080aef10 ceec10a0 080aef10 c013886a
>   Call Trace:
>    <c012ea09> filemap_nopage+0x98/0x2b2  <c01385d1> do_no_page+0x6d/0x1e1
>    <c013886a> __handle_mm_fault+0xc4/0x162  <c0112190> do_page_fault+0x23e/0x56b
>    <c01c43c1> copy_to_user+0x41/0x49  <c0111f52> do_page_fault+0x0/0x56b
>    <c010342f> error_code+0x4f/0x54 
>   Code: a0 fe ff ff 89 ea b9 e2 d7 12 c0 6a 02 e8 5f ec 15 00 83 c4 44 5b 5e 5f 5d c3 fa 83 c0 04 e8 2c 3f 09 00 85 c0 89 c1 74 0f 89 c2 <8b> 00 f6 c4 40 74 03 8b 51 0c ff 42 04 fb 89 c8 c3 fa 83 c0 04 
> 
> 
> In the case of the second oops the machine was still partially usable and a
> clean shutdown was possible.  However, services such as sshd were no longer
> responding.
> 
>   BUG: unable to handle kernel paging request at virtual address 0010c744
>    printing eip:
>   c013be50
>   *pde = 00000000
>   Oops: 0002 [#1]
>   Modules linked in: ntfs 8139too via_agp agpgart usb_storage ehci_hcd uhci_hcd usbcore
>   CPU:    0
>   EIP:    0060:[<c013be50>]    Tainted: G   M  VLI
>   EFLAGS: 00010282   (2.6.17 #2) 
>   EIP is at anon_vma_unlink+0x16/0x3c
>   eax: 0010c740   ebx: cf1070cc   ecx: cf107104   edx: cf8bc740
>   esi: cf8bc740   edi: b7e82000   ebp: 00000000   esp: cdad7f58
>   ds: 007b   es: 007b   ss: 0068
>   Process sh (pid: 20272, threadinfo=cdad6000 task=c0d8d580)
>   Stack: cf1070cc cf61f3e4 c0136b5f cdad7f80 c4084b74 cf8b5860 00000001 00000000
>          c013ab92 00000000 c0371b7c 000000b9 cf8b5860 c0d8d580 c01145dd cdad6000
>          c0118187 cdad6000 00000000 00000000 cdad6000 c0118380 00000000 b7f9968c
>   Call Trace:
>    <c0136b5f> free_pgtables+0x41/0x82  <c013ab92> exit_mmap+0x6a/0xb8
>    <c01145dd> mmput+0x1b/0x5e  <c0118187> do_exit+0x14e/0x2d1
>    <c0118380> sys_exit_group+0x0/0xd  <c010299b> syscall_call+0x7/0xb
>   Code: c9 74 10 8b 11 8d 40 38 89 42 04 89 53 38 89 48 04 89 01 5b c3 56 53 8b 70 40 89 c3 85 f6 74 2e 8d 48 38 8b 40 38 8b 51 04 89 02 <89> 50 04 c7 43 38 00 01 10 00 39 36 c7 41 04 00 02 20 00 75 0e 
>   EIP: [<c013be50>] anon_vma_unlink+0x16/0x3c SS:ESP 0068:cdad7f58
>    <1>Fixing recursive fault but reboot is needed!
> 
> I'm not entirely sure why the kernel considered itself tainted in the second
> oops and not in the first - the setup hadn't changed and precisely the same
> kernel modules were loaded.  This machine does not have any external (ie:
> out-of-tree) modules installed.

Weird.  The traces do not include NTFS at all in them so I have no idea
if NTFS has anything to do with this or not...

Anyone have any idea what these oopses mean?!?

Best regards,

        Anton
-- 
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net
WWW: http://www.linux-ntfs.org/ & http://www-stu.christs.cam.ac.uk/~aia21/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/