Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756639AbXHRQ2V (ORCPT ); Sat, 18 Aug 2007 12:28:21 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1754866AbXHRQ2J (ORCPT ); Sat, 18 Aug 2007 12:28:09 -0400 Received: from baldrick.fusednetworks.co.uk ([83.142.228.48]:58767 "EHLO baldrick.fusednetworks.co.uk" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S1753179AbXHRQ2H (ORCPT ); Sat, 18 Aug 2007 12:28:07 -0400 Message-ID: <46C71E1B.40702@bootc.net> Date: Sat, 18 Aug 2007 17:28:11 +0100 From: Chris Boot User-Agent: Thunderbird 2.0.0.6 (X11/20070802) MIME-Version: 1.0 To: linux-kernel@vger.kernel.org Subject: Re: Panic with XFS on RHEL5 (2.6.18-8.1.8.el5) References: <46C6C145.70506@abacustree.com> <46C6FF93.4040509@bootc.net> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4035 Lines: 106 M?ns Rullg?rd wrote: > Chris Boot writes: > > >> M?ns Rullg?rd wrote: >> >>> Chris Boot writes: >>> >>> >>> >>>> All, >>>> >>>> I've got a box running RHEL5 and haven't been impressed by ext3 >>>> performance on it (running of a 1.5TB HP MSA20 using the cciss >>>> driver). I compiled XFS as a module and tried it out since I'm used to >>>> using it on Debian, which runs much more efficiently. However, every >>>> so often the kernel panics as below. Apologies for the tainted kernel, >>>> but we run VMware Server on the box as well. >>>> >>>> Does anyone have any hits/tips for using XFS on Red Hat? What's >>>> causing the panic below, and is there a way around this? >>>> >>>> BUG: unable to handle kernel paging request at virtual address b8af9d60 >>>> printing eip: >>>> c0415974 >>>> *pde = 00000000 >>>> Oops: 0000 [#1] >>>> SMP last sysfs file: /block/loop7/dev >>>> > [...] > >>>> [] xfsbufd_wakeup+0x28/0x49 [xfs] >>>> [] shrink_slab+0x56/0x13c >>>> [] try_to_free_pages+0x162/0x23e >>>> [] __alloc_pages+0x18d/0x27e >>>> [] find_or_create_page+0x53/0x8c >>>> [] __getblk+0x162/0x270 >>>> [] do_lookup+0x53/0x157 >>>> [] ext3_getblk+0x7c/0x233 [ext3] >>>> [] ext3_getblk+0xeb/0x233 [ext3] >>>> [] mntput_no_expire+0x11/0x6a >>>> [] ext3_bread+0x13/0x69 [ext3] >>>> [] htree_dirblock_to_tree+0x22/0x113 [ext3] >>>> [] ext3_htree_fill_tree+0x58/0x1a0 [ext3] >>>> [] do_path_lookup+0x20e/0x25f >>>> [] get_empty_filp+0x99/0x15e >>>> [] ext3_permission+0x0/0xa [ext3] >>>> [] ext3_readdir+0x1ce/0x59b [ext3] >>>> [] filldir+0x0/0xb9 >>>> [] sys_fstat64+0x1e/0x23 >>>> [] vfs_readdir+0x63/0x8d >>>> [] filldir+0x0/0xb9 >>>> [] sys_getdents+0x5f/0x9c >>>> [] syscall_call+0x7/0xb >>>> ======================= >>>> >>>> >>> Your Redhat kernel is probably built with 4k stacks and XFS+loop+ext3 >>> seems to be enough to overflow it. >>> >>> >> Thanks, that explains a lot. However, I don't have any XFS filesystems >> mounted over loop devices on ext3. Earlier in the day I had iso9660 on >> loop on xfs, could that have caused the issue? It was unmounted and >> deleted when this panic occurred. >> > > The mention of /block/loop7/dev and the presence both XFS and ext3 > function in the call stack suggested to me that you might have an ext3 > filesystem in a loop device on XFS. I see no other explanation for > that call stack other than a stack overflow, but then we're still back > at the same root cause. > > Are you using device-mapper and/or md? They too are known to blow 4k > stacks when used with XFS. > I am. The situation was earlier on was iso9660 on loop on xfs on lvm on cciss. I guess that might have smashed the stack undetectably and induced corruption encountered later on? When I experienced this panic the machine would have probably been performing a backup, which was simply a load of ext3/xfs filesystems on lvm on the HP cciss controller. None of the loop devices would have been mounted. I have a few machines now with 4k stacks and using lvm + md + xfs and have no trouble at all, but none are Red Hat (all Debian) and none use cciss either. Maybe it's a deadly combination. >> I'll probably just try and recompile the kernel with 8k stacks and see >> how it goes. Screw the support, we're unlikely to get it anyway. :-P >> > > Please report how this works out. > I will. This will probably be on Monday now, since the machine isn't accepting SysRq requests over the serial console. :-( Many thanks, Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/