Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751082AbWIUJyv (ORCPT ); Thu, 21 Sep 2006 05:54:51 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1751101AbWIUJyv (ORCPT ); Thu, 21 Sep 2006 05:54:51 -0400 Received: from ppsw-4.csi.cam.ac.uk ([131.111.8.134]:3821 "EHLO ppsw-4.csi.cam.ac.uk") by vger.kernel.org with ESMTP id S1751082AbWIUJyu (ORCPT ); Thu, 21 Sep 2006 05:54:50 -0400 X-Cam-SpamDetails: Not scanned X-Cam-AntiVirus: No virus found X-Cam-ScannerInfo: http://www.cam.ac.uk/cs/email/scanner/ Subject: Re: Fw: 2.6.17 oops, possibly ntfs/mmap related From: Anton Altaparmakov To: Jonathan Woithe Cc: linux-kernel@vger.kernel.org, Andrew Morton In-Reply-To: <20060912205602.57568b2a.akpm@osdl.org> References: <20060912205602.57568b2a.akpm@osdl.org> Content-Type: text/plain Organization: Computing Service, University of Cambridge, UK Date: Thu, 21 Sep 2006 10:54:43 +0100 Message-Id: <1158832483.5958.7.camel@imp.csi.cam.ac.uk> Mime-Version: 1.0 X-Mailer: Evolution 2.6.0 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5188 Lines: 114 Hi, On Tue, 2006-09-12 at 20:56 -0700, Andrew Morton wrote: Andrew, thanks for forwarding me the message... > Begin forwarded message: > > Date: Wed, 13 Sep 2006 10:05:42 +0930 (CST) > From: Jonathan Woithe > To: linux-kernel@vger.kernel.org > Cc: jwoithe@physics.adelaide.edu.au (Jonathan Woithe) > Subject: 2.6.17 oops, possibly ntfs/mmap related > > > We have a machine which is currently making heavy use of a usb hard disc > formatted with ntfs. There have been two occasions where the kernel has > oopsed while this disc was being accessed heavily. Before adding this HDD > the machine in question was rock solid which leads me to think that it > might be related to ntfs. USB drives formatted with other filesystems do > not appear to suffer from this problem. > > Unfortunately bogofilter considers the oops reports as spam so I cannot post > them to the list. I have instead put the full text of my original post > regarding this topic on the web at > > http://www.atrad.com.au/~jwoithe/kernel/oopses-20060913.txt > > I'm happy to try things to narrow down the cause if it will help. > > Please CC me on reply. These were the oopses from above text url: > The first oops caused the machine to totally lock up: > > BUG: unable to handle kernel paging request at virtual address e4004de0 > printing eip: > c012de0c > *pde = 00000000 > Oops: 0000 [#1] > Modules linked in: ntfs 8139too via_agp agpgart usb_storage ehci_hcd uhci_hcd usbcore > CPU: 0 > EIP: 0060:[] Not tainted VLI > EFLAGS: 00010082 (2.6.17 #2) > EIP is at find_get_page+0x11/0x22 > eax: e4004de0 ebx: c02f01a8 ecx: e4004de0 edx: e4004de0 > esi: 00000000 edi: 00000066 ebp: cfc20574 esp: c770bee8 > ds: 007b es: 007b ss: 0068 > Process sh (pid: 10467, threadinfo=c770a000 task=cfa495c0) > Stack: c012ea09 00000002 00000000 cfc204d8 cff882a4 cff88260 c770bf30 c5962544 > c02f01a8 00000000 ceec10a0 080aef10 c01385d1 00000000 cfc20574 080aef10 > c5962544 ceec10a0 00000002 c7aeb080 080aef10 ceec10a0 080aef10 c013886a > Call Trace: > filemap_nopage+0x98/0x2b2 do_no_page+0x6d/0x1e1 > __handle_mm_fault+0xc4/0x162 do_page_fault+0x23e/0x56b > copy_to_user+0x41/0x49 do_page_fault+0x0/0x56b > error_code+0x4f/0x54 > Code: a0 fe ff ff 89 ea b9 e2 d7 12 c0 6a 02 e8 5f ec 15 00 83 c4 44 5b 5e 5f 5d c3 fa 83 c0 04 e8 2c 3f 09 00 85 c0 89 c1 74 0f 89 c2 <8b> 00 f6 c4 40 74 03 8b 51 0c ff 42 04 fb 89 c8 c3 fa 83 c0 04 > > > In the case of the second oops the machine was still partially usable and a > clean shutdown was possible. However, services such as sshd were no longer > responding. > > BUG: unable to handle kernel paging request at virtual address 0010c744 > printing eip: > c013be50 > *pde = 00000000 > Oops: 0002 [#1] > Modules linked in: ntfs 8139too via_agp agpgart usb_storage ehci_hcd uhci_hcd usbcore > CPU: 0 > EIP: 0060:[] Tainted: G M VLI > EFLAGS: 00010282 (2.6.17 #2) > EIP is at anon_vma_unlink+0x16/0x3c > eax: 0010c740 ebx: cf1070cc ecx: cf107104 edx: cf8bc740 > esi: cf8bc740 edi: b7e82000 ebp: 00000000 esp: cdad7f58 > ds: 007b es: 007b ss: 0068 > Process sh (pid: 20272, threadinfo=cdad6000 task=c0d8d580) > Stack: cf1070cc cf61f3e4 c0136b5f cdad7f80 c4084b74 cf8b5860 00000001 00000000 > c013ab92 00000000 c0371b7c 000000b9 cf8b5860 c0d8d580 c01145dd cdad6000 > c0118187 cdad6000 00000000 00000000 cdad6000 c0118380 00000000 b7f9968c > Call Trace: > free_pgtables+0x41/0x82 exit_mmap+0x6a/0xb8 > mmput+0x1b/0x5e do_exit+0x14e/0x2d1 > sys_exit_group+0x0/0xd syscall_call+0x7/0xb > Code: c9 74 10 8b 11 8d 40 38 89 42 04 89 53 38 89 48 04 89 01 5b c3 56 53 8b 70 40 89 c3 85 f6 74 2e 8d 48 38 8b 40 38 8b 51 04 89 02 <89> 50 04 c7 43 38 00 01 10 00 39 36 c7 41 04 00 02 20 00 75 0e > EIP: [] anon_vma_unlink+0x16/0x3c SS:ESP 0068:cdad7f58 > <1>Fixing recursive fault but reboot is needed! > > I'm not entirely sure why the kernel considered itself tainted in the second > oops and not in the first - the setup hadn't changed and precisely the same > kernel modules were loaded. This machine does not have any external (ie: > out-of-tree) modules installed. Weird. The traces do not include NTFS at all in them so I have no idea if NTFS has anything to do with this or not... Anyone have any idea what these oopses mean?!? Best regards, Anton -- Anton Altaparmakov (replace at with @) Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net WWW: http://www.linux-ntfs.org/ & http://www-stu.christs.cam.ac.uk/~aia21/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/