From: Rick Warner <rick@microway.com>
Organization: Microway, Inc.
To: linux-kernel@vger.kernel.org
Subject: very strange issue with sata,<4G Ram, and ext3
Date: Thu, 28 Apr 2005 12:16:07 -0400
User-Agent: KMail/1.7.2
Message-Id: <200504281216.08026.rick@microway.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 2304
Lines: 44

Hello,
 We are having a very strange issue on some 64bit systems.  We have a 32 node 
cluster of EM64T's (supermicro boards).  We are using our node restore 
software to propagate a linux install onto them.  We do a pxe boot to a 
kernel and initrd image.  The initrd has some config info, a basic root 
filesystem, and a restore script.  The kernel is passed init=/restore  (the 
restore script itself).  The script runs dhcp, gets an ip, then nfs mounts 
the master node of the cluster.  The backup image is stored on the master 
node's nfs mount.  The script then applies a backed up partition table and 
then mkfs's the partitions, mounts them, untars a backup tar to the drive, 
and then makes it bootable with grub.

 On these systems, we are getting ext2 errors from the initrd during the 
untarring.  Soon after, we start getting seg faults on random things (looks 
like stuff caused by the still running dhcp client), and then a continuous 
stream of segfaults on the restore script itself (restore[1]).

 The systems being restored are dual em64t's with 2G of ram and 200G sata 
drives.  If we up the memory to 4G, the restores complete without error. If 
we reduce down to 512M, the segfaults start at the mkfs stage instead of the 
untar stage. We've tried different sata drives and controllers without 
change.  Switching to ide drives works.  Switching to reiserfs instead of 
ext3 for the destination drives works too.  We've tried enabling the scsi 
debug stuff as well as the jbd debug stuff for ext3 without getting any more 
info.  We also enabled the kernel debug options too.  We've also tried using 
the deprecated ide based sata drivers instead of the scsi based ones without 
success.  We have tried restoring to Intel's Jarell EM64T systems as well as 
an Arima HDAMA opteron with the same errors.  We've also tried adding swap 
space ASAP in the inird image.  

 This problem is really baffling us and we're not quite sure what to check 
into next.  Any ideas?


-- 
Richard Warner
Lead Systems Integrator
Microway, Inc
(508)732-5517
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/