Message-ID: <3E1D49BF.5114A896@digeo.com>
Date: Thu, 09 Jan 2003 02:06:55 -0800
From: Andrew Morton <akpm@digeo.com>
MIME-Version: 1.0
To: yuval yeret <yuval_yeret@hotmail.com>
CC: linux-kernel@vger.kernel.org, yuval@exanet.com
Subject: Re: 2.4.18-14 kernel stuck during ext3 umount with ping still responding
References: <F12N44yQegpeDBHkKx400013b3e@hotmail.com>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 2971
Lines: 83

yuval yeret wrote:
> 
> Hi,
> 
> I'm running a 2.4.18-14 kernel with a heavy IO profile using ext3 over RAID
> 0+1 volumes.
> 
> >From time to time I get a black screen stuck machine while trying to umount
> a volume during an IO workload (as part of a failback solution - but after
> killing all IO processes ), with ping still responding, but everything else
> mostly dead.
> 
> I tried using the forcedumount patch to solve this problem - to no avail.
> Also tried upgrading the qlogic drivers to the latest drivers from Qlogic.
> 
> After one of the occurences I managed to get some output using the sysrq
> keys.
> 
> This seems similar to what is described in
> http://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=77508 but with a
> different call trace
> 
> What I have here is what I managed to copy down (for some reason pgup/pgdown
> didn't work so not all information is full...) together with a manual lookup
> of the call trace
> from /proc/ksyms :
> 
> process umount
> EIP c01190b8                    (set_running_and_schedule)
> call trace:
> c01144c9        f25f9ec0        IO_APIC_get_PCI_irq_vector
> c010a8b0        f25f9ed0        enable_irq
> c014200c        f25f9ef0        fsync_buffers_list
> c0155595        f25f9efc        clear_inode
> c015553d        f25f9f2c        invalidate_inodes
> c01461d8        f25f9f78        get_super
> c014a629        f25f9f94        path_release
> c0157c58        f25f9fc0        sys_umount
> c0108cab                        sys_sigaltstack
> 
> Any idea what can cause this ?
> 

If you have a large amount of data against two or more filesystems,
and you try to unmount one of them the kernel can seize up for a
very long time in the fsync_dev()->sync_buffers() function.  Under
these circumstances that function has O(n*n) search complexity
and n is quite large.

However your backtrace shows neither of those functions.

Still, as an experiment it would be interesting to see if the below
patch fixes it up.  It converts O(n*n) to O(m), where m > n.


 fs/buffer.c |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

--- 2420/fs/buffer.c~a	Thu Jan  9 02:03:06 2003
+++ 2420-akpm/fs/buffer.c	Thu Jan  9 02:04:02 2003
@@ -307,11 +307,11 @@ int sync_buffers(kdev_t dev, int wait)
 	 * 2) write out all dirty, unlocked buffers;
 	 * 2) wait for completion by waiting for all buffers to unlock.
 	 */
-	write_unlocked_buffers(dev);
+	write_unlocked_buffers(NODEV);
 	if (wait) {
-		err = wait_for_locked_buffers(dev, BUF_DIRTY, 0);
+		err = wait_for_locked_buffers(NODEV, BUF_DIRTY, 0);
 		write_unlocked_buffers(dev);
-		err |= wait_for_locked_buffers(dev, BUF_LOCKED, 1);
+		err |= wait_for_locked_buffers(NODEV, BUF_LOCKED, 1);
 	}
 	return err;
 }

_
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/