Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965224AbXBFDF3 (ORCPT ); Mon, 5 Feb 2007 22:05:29 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S965242AbXBFDF3 (ORCPT ); Mon, 5 Feb 2007 22:05:29 -0500 Received: from 60-248-88-209.HINET-IP.hinet.net ([60.248.88.209]:51891 "EHLO areca.com.tw" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S965224AbXBFDF3 (ORCPT ); Mon, 5 Feb 2007 22:05:29 -0500 X-Greylist: delayed 1418 seconds by postgrey-1.27 at vger.kernel.org; Mon, 05 Feb 2007 22:05:28 EST Message-ID: <000501c74998$79b41490$4e00a8c0@erich2003> From: "erich" To: "Igmar Palsenberg" Cc: "Andrew Morton" , "linux kernel" , References: <20061130212248.1b49bd32.akpm@osdl.org> <20061206074008.2f308b2b.akpm@osdl.org> <20061214004213.13149a48.akpm@osdl.org> <20061214011042.7b279be6.akpm@osdl.org> Subject: Re: 2.6.16.32 stuck in generic_file_aio_write() Date: Tue, 6 Feb 2007 10:42:38 +0800 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.3790.2826 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.3790.2826 X-OriginalArrivalTime: 06 Feb 2007 02:40:41.0062 (UTC) FILETIME=[31145060:01C74998] Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1999 Lines: 65 Dear Igmar Palsenberg, I can not make sure it is hardware problem, but I have interest in this case's reproducing. If you tell me your platform's construction, I will try it and give you good solution. Does your RAID adapter's firmware version work on 1.42? Areca firmware had fix some hardware bugs and rare sg length handle in this version. Best Regards Erich Chen ----- Original Message ----- From: "Igmar Palsenberg" To: "Andrew Morton" Cc: ; ; "erich" Sent: Monday, February 05, 2007 6:24 PM Subject: Re: 2.6.16.32 stuck in generic_file_aio_write() > >> Does the other machine have the same problems? > > It does. It seems to depend on the interrupt frequency : Setting > KERNEL_HZ=250 > makes it ony appear once a month or so, with KERNEL_HZ=1000, it will > occur within a week. It does happen a lot less with the other machine, > which isn't under disk activity load as much as the other machine. > >> Are you able to rule out a hardware failure? > > Well.. It's too much coincidence that 2 (almost identical) machines show > the same weard behaviour. What strikes me that only *disk* interrupts > after a while don't get handled. The machine itself is alive, just all > disk IO is blocked, which makes it pretty much useless. > > Erich, could this be some sort of hardware problem ? I know it's a PITA to > reproduce, but setting CONFIG_HZ to 1000 and bashing the machine with > diskactivity seems to help :) > > > Regards, > > > Igmar > > -- > Igmar Palsenberg > JDI ICT > > Zutphensestraatweg 85 > 6953 CJ Dieren > Tel: +31 (0)313 - 496741 > Fax: +31 (0)313 - 420996 > The Netherlands > > mailto: i.palsenberg@jdi-ict.nl - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/