Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752651AbXBSBCU (ORCPT ); Sun, 18 Feb 2007 20:02:20 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1752652AbXBSBCU (ORCPT ); Sun, 18 Feb 2007 20:02:20 -0500 Received: from agminet01.oracle.com ([141.146.126.228]:27892 "EHLO agminet01.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752651AbXBSBCT (ORCPT ); Sun, 18 Feb 2007 20:02:19 -0500 Date: Sun, 18 Feb 2007 20:01:02 -0500 From: Chris Mason To: Miklos Szeredi Cc: akpm@linux-foundation.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: dirty balancing deadlock Message-ID: <20070219010102.GC9289@think.oraclecorp.com> References: <20070218125307.4103c04a.akpm@linux-foundation.org> <20070218145929.547c21c7.akpm@linux-foundation.org> <20070218155916.0d3c73a9.akpm@linux-foundation.org> <20070219004537.GB9289@think.oraclecorp.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.12-2006-07-14 X-Whitelist: TRUE X-Whitelist: TRUE X-Brightmail-Tracker: AAAAAQAAAAI= Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2511 Lines: 49 On Mon, Feb 19, 2007 at 01:54:31AM +0100, Miklos Szeredi wrote: > > > > > > If so, writes to B will decrease the dirty memory threshold. > > > > > > > > > > Yes, but not by enough. Say A dirties a 1100 pages, limit is 1000. > > > > > Some pages queued for writeback (doesn't matter how much). B writes > > > > > back 1, 1099 dirty remain in A, zero in B. balance_dirty_pages() for > > > > > B doesn't know that there's nothing more to write back for B, it's > > > > > just waiting there for those 1099, which'll never get written. > > > > > > > > hm, OK, arguable. I guess something like this.. > > > > > > Doesn't help the fuse case, but does seem to help the loopback mount > > > one. > > > > > > For fuse it's worse with the patch: now the write triggered by the > > > balance recurses into fuse, with disastrous results, since the fuse > > > writeback is now blocked on the userspace queue. > > > > > > fusexmp_fh_no D 40136678 0 505 494 506 504 (NOTLB) > > > 08982b78 00000001 00000000 08f9f9b4 0805d8cb 089a75f8 08982b78 08f98000 > > > 08f98000 08f9f9dc 0805a38a 089a7100 08982680 08f9f9cc 08f98000 08f98000 > > > 085d8300 08982680 089a7100 08f9fa34 08183006 089a7100 08982680 089a7100 Call Trace: > > > 08f9f9a0: [<0805d8cb>] switch_to_skas+0x3b/0x83 > > > 08f9f9b8: [<0805a38a>] _switch_to+0x49/0x99 > > > 08f9f9e0: [<08183006>] schedule+0x246/0x547 > > > 08f9fa38: [<08103c7e>] fuse_get_req_wp+0xe9/0x14a > > > 08f9fa70: [<08103d2e>] fuse_writepage+0x4f/0x12c > > > > In general, writepage is supposed to do work without blocking on > > expensive locks that will get pdflush and dirty reclaim stuck in this > > fashion. You'll probably have to take the same approach reiserfs does > > in data=journal mode, which is leaving the page dirty if fuse_get_req_wp > > is going to block without making progress. > > Pdflush, and dirty reclaim set wbc->nonblocking to true. > balance_dirty_pages and fsync don't. The problem here is that > Andrew's patch is wrong to let balance_dirty_pages() try to write back > pages from a different queue. async or sync, writepage is supposed to either make progress or bail. loopback aside, if the fuse call is blocking long term, you're going to run into problems. -chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/