Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1760341AbYCSTwk (ORCPT ); Wed, 19 Mar 2008 15:52:40 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1756710AbYCSTgW (ORCPT ); Wed, 19 Mar 2008 15:36:22 -0400 Received: from relay1.sgi.com ([192.48.171.29]:37291 "EHLO relay.sgi.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1752231AbYCSTgE (ORCPT ); Wed, 19 Mar 2008 15:36:04 -0400 Date: Wed, 19 Mar 2008 23:17:21 +1100 From: David Chinner To: Jan Kara Cc: David Chinner , lkml , linux-fsdevel Subject: Re: BUG: drop_pagecache_sb vs kjournald lockup Message-ID: <20080319121721.GH155407@sgi.com> References: <20080318112843.GJ95344431@sgi.com> <20080318134326.GA6558@atrey.karlin.mff.cuni.cz> <20080318204646.GY155407@sgi.com> <20080319100805.GB22109@duck.suse.cz> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20080319100805.GB22109@duck.suse.cz> User-Agent: Mutt/1.4.2.1i Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2339 Lines: 56 On Wed, Mar 19, 2008 at 11:08:05AM +0100, Jan Kara wrote: > On Wed 19-03-08 07:46:46, David Chinner wrote: > > On Tue, Mar 18, 2008 at 02:43:26PM +0100, Jan Kara wrote: > > > > 2.6.25-rc3, 4p ia64, ext3 root drive. > > > > > > > > I was running an XFS stress test on one of the XFS partitions on > > > > the machine (zero load on the root ext3 drive), when the system > > > > locked up in kjournald with this on the console: > > > > > > > > BUG: spinlock lockup on CPU#2, kjournald/2150, a000000100e022e0 > > > > > > > > > > > > > > > > > Anyone know the reason why drop_pagecache_sb() uses such a brute-force > > > > mechanism to free up clean page cache pages? > > > Yes, we know that drop_pagecache_sb() has locking issues but since it > > > is intended to be used for debugging purposes only, nobody cared enough > > > to fix it. Completely untested patch below if you dare to try ;) > > > > It may be intended for debuging purposes, but it does get used in > > production HPC environments (a lot!). I guess I've never seen this > > lockup before because SGI customers don't use ext3, but they have > > complained about the system "stopping" while drop_caches is executed. > > This locking ..... strategy would explain it, though. > ;) But in case your customers use it in production, shouldn't you push > for a better interface for such feature? Just a thought... Because it's much less horrible than the thing it replaced, does what is needed and mostly does not cause problems? And because it usually just works I've never felt the need to look at the implementation.... /me is off to look at Fengguang's patch > > I'll try the patch, but I can't guarantee anything - I only saw this > > lockup once in about 18 hours when dropping caches every 2 seconds. > Well, if you enable lockdep, it should warn you about possible problems > much earlier (at least we already got several reports of lockdep warnings > when using drop_caches). ia64 - no lockdep :/ Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/