Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1945954AbXBVICu (ORCPT ); Thu, 22 Feb 2007 03:02:50 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1945951AbXBVICt (ORCPT ); Thu, 22 Feb 2007 03:02:49 -0500 Received: from mail-gw2.sa.eol.hu ([212.108.200.109]:44686 "EHLO mail-gw2.sa.eol.hu" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1945947AbXBVICs (ORCPT ); Thu, 22 Feb 2007 03:02:48 -0500 To: akpm@linux-foundation.org CC: miklos@szeredi.hu, linux-kernel@vger.kernel.org, linux-mm@kvack.org In-reply-to: <20070221235532.2361f827.akpm@linux-foundation.org> (message from Andrew Morton on Wed, 21 Feb 2007 23:55:32 -0800) Subject: Re: dirty balancing deadlock References: <20070218125307.4103c04a.akpm@linux-foundation.org> <20070218145929.547c21c7.akpm@linux-foundation.org> <20070218155916.0d3c73a9.akpm@linux-foundation.org> <20070221133631.a5cbf49f.akpm@linux-foundation.org> <20070221235532.2361f827.akpm@linux-foundation.org> Message-Id: From: Miklos Szeredi Date: Thu, 22 Feb 2007 09:02:24 +0100 Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3122 Lines: 72 > > On Thu, 22 Feb 2007 08:42:26 +0100 Miklos Szeredi wrote: > > > > > > > > Index: linux/mm/page-writeback.c > > > > =================================================================== > > > > --- linux.orig/mm/page-writeback.c 2007-02-19 17:32:41.000000000 +0100 > > > > +++ linux/mm/page-writeback.c 2007-02-19 18:05:28.000000000 +0100 > > > > @@ -198,6 +198,25 @@ static void balance_dirty_pages(struct a > > > > dirty_thresh) > > > > break; > > > > > > > > + /* > > > > + * Acquit this producer if there's little or nothing > > > > + * to write back to this particular queue > > > > + * > > > > + * Without this check a deadlock is possible in the > > > > + * following case: > > > > + * > > > > + * - filesystem A writes data through filesystem B > > > > + * - filesystem A has dirty pages over dirty_thresh > > > > + * - writeback is started, this triggers a write in B > > > > + * - balance_dirty_pages() is called synchronously > > > > + * - the write to B blocks > > > > + * - the writeback completes, but dirty is still over threshold > > > > + * - the blocking write prevents futher writes from happening > > > > + */ > > > > + if (atomic_long_read(&bdi->nr_dirty) + > > > > + atomic_long_read(&bdi->nr_writeback) < 16) > > > > + break; > > > > + > > > > > > The problem seems to that little "- the write to B blocks". > > > > > > How come it blocks? I mean, if we cannot retire writes to that filesystem > > > then we're screwed anyway. > > > > Sorry about the sloppy description. I mean, it's not the lowlevel > > write that will block, but rather the VFS one > > (generic_file_aio_write). It will block (or rather loop forever with > > 0.1 second sleeps) in balance_dirty_pages(). That means, that for > > this inode, i_mutex is held and no other writer can continue the work. > > "this inode" I assume is the inode against filesystem A? No, the one in B. > Why does holding that inode's i_mutex prevent further writeback of > pages in A? It is generic_file_aio_write() that is holding the mutex. Here's the stack for the filesystem daemon trying to write back a page: 08dcfb40: [<08182fe6>] schedule+0x246/0x547 08dcfb98: [<08183a03>] schedule_timeout+0x4e/0xb6 08dcfbcc: [<08183991>] io_schedule_timeout+0x11/0x20 08dcfbd4: [<080a0cf2>] congestion_wait+0x72/0x87 08dcfc04: [<0809c693>] balance_dirty_pages+0xa8/0x153 08dcfc5c: [<0809c7bf>] balance_dirty_pages_ratelimited_nr+0x43/0x45 08dcfc68: [<080992b5>] generic_file_buffered_write+0x3e3/0x6f5 08dcfd20: [<0809988e>] __generic_file_aio_write_nolock+0x2c7/0x5dd 08dcfda8: [<08099cb6>] generic_file_aio_write+0x55/0xc7 08dcfddc: [<080ea1e6>] ext3_file_write+0x39/0xaf 08dcfe04: [<080b060b>] do_sync_write+0xd8/0x10e 08dcfebc: [<080b06e3>] vfs_write+0xa2/0x1cb 08dcfeec: [<080b09b8>] sys_pwrite64+0x65/0x69 Miklos - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/