Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752623AbXBSAsO (ORCPT ); Sun, 18 Feb 2007 19:48:14 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1752628AbXBSAsO (ORCPT ); Sun, 18 Feb 2007 19:48:14 -0500 Received: from rgminet01.oracle.com ([148.87.113.118]:62364 "EHLO rgminet01.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752623AbXBSAsN (ORCPT ); Sun, 18 Feb 2007 19:48:13 -0500 Date: Sun, 18 Feb 2007 19:45:37 -0500 From: Chris Mason To: Miklos Szeredi Cc: akpm@linux-foundation.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: dirty balancing deadlock Message-ID: <20070219004537.GB9289@think.oraclecorp.com> References: <20070218125307.4103c04a.akpm@linux-foundation.org> <20070218145929.547c21c7.akpm@linux-foundation.org> <20070218155916.0d3c73a9.akpm@linux-foundation.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.12-2006-07-14 X-Brightmail-Tracker: AAAAAQAAAAI= X-Brightmail-Tracker: AAAAAQAAAAI= X-Whitelist: TRUE X-Whitelist: TRUE Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2150 Lines: 44 On Mon, Feb 19, 2007 at 01:25:21AM +0100, Miklos Szeredi wrote: > > > > If so, writes to B will decrease the dirty memory threshold. > > > > > > Yes, but not by enough. Say A dirties a 1100 pages, limit is 1000. > > > Some pages queued for writeback (doesn't matter how much). B writes > > > back 1, 1099 dirty remain in A, zero in B. balance_dirty_pages() for > > > B doesn't know that there's nothing more to write back for B, it's > > > just waiting there for those 1099, which'll never get written. > > > > hm, OK, arguable. I guess something like this.. > > Doesn't help the fuse case, but does seem to help the loopback mount > one. > > For fuse it's worse with the patch: now the write triggered by the > balance recurses into fuse, with disastrous results, since the fuse > writeback is now blocked on the userspace queue. > > fusexmp_fh_no D 40136678 0 505 494 506 504 (NOTLB) > 08982b78 00000001 00000000 08f9f9b4 0805d8cb 089a75f8 08982b78 08f98000 > 08f98000 08f9f9dc 0805a38a 089a7100 08982680 08f9f9cc 08f98000 08f98000 > 085d8300 08982680 089a7100 08f9fa34 08183006 089a7100 08982680 089a7100 Call Trace: > 08f9f9a0: [<0805d8cb>] switch_to_skas+0x3b/0x83 > 08f9f9b8: [<0805a38a>] _switch_to+0x49/0x99 > 08f9f9e0: [<08183006>] schedule+0x246/0x547 > 08f9fa38: [<08103c7e>] fuse_get_req_wp+0xe9/0x14a > 08f9fa70: [<08103d2e>] fuse_writepage+0x4f/0x12c In general, writepage is supposed to do work without blocking on expensive locks that will get pdflush and dirty reclaim stuck in this fashion. You'll probably have to take the same approach reiserfs does in data=journal mode, which is leaving the page dirty if fuse_get_req_wp is going to block without making progress. Queue it somewhere else (ie an internal Fs cleaning thread) and leave the page dirty so that we can move on to other pages that have a chance of being cleaned. -chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/