Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755651AbXKJWzM (ORCPT ); Sat, 10 Nov 2007 17:55:12 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1754738AbXKJWzA (ORCPT ); Sat, 10 Nov 2007 17:55:00 -0500 Received: from smtp2.linux-foundation.org ([207.189.120.14]:52803 "EHLO smtp2.linux-foundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754704AbXKJWy7 (ORCPT ); Sat, 10 Nov 2007 17:54:59 -0500 Date: Sat, 10 Nov 2007 14:54:44 -0800 From: Andrew Morton To: Mikulas Patocka Cc: linux-kernel@vger.kernel.org, Peter Zijlstra , WU Fengguang Subject: Re: Temporary lockup on loopback block device Message-Id: <20071110145444.a6993df1.akpm@linux-foundation.org> In-Reply-To: References: X-Mailer: Sylpheed 2.4.1 (GTK+ 2.8.17; x86_64-unknown-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2991 Lines: 75 On Sat, 10 Nov 2007 20:51:31 +0100 (CET) Mikulas Patocka wrote: > Hi > > I am experiencing a transient lockup in 'D' state with loopback device. It > happens when process writes to a filesystem in loopback with command like > dd if=/dev/zero of=/s/fill bs=4k > > CPU is idle, disk is idle too, yet the dd process is waiting in 'D' in > congestion_wait called from balance_dirty_pages. > > After about 30 seconds, the lockup is gone and dd resumes, but it locks up > soon again. > > I added a printk to the balance_dirty_pages > printk("wait: nr_reclaimable %d, nr_writeback %d, dirty_thresh %d, > pages_written %d, write_chunk %d\n", nr_reclaimable, > global_page_state(NR_WRITEBACK), dirty_thresh, pages_written, > write_chunk); > > and it shows this during the lockup: > > wait: nr_reclaimable 3099, nr_writeback 0, dirty_thresh 2985, > pages_written 1021, write_chunk 1522 > wait: nr_reclaimable 3099, nr_writeback 0, dirty_thresh 2985, > pages_written 1021, write_chunk 1522 > wait: nr_reclaimable 3099, nr_writeback 0, dirty_thresh 2985, > pages_written 1021, write_chunk 1522 > > What apparently happens: > > writeback_inodes syncs inodes only on the given wbc->bdi, however > balance_dirty_pages checks against global counts of dirty pages. So if > there's nothing to sync on a given device, but there are other dirty pages > so that the counts are over the limit, it will loop without doing any > work. > > To reproduce it, you need totally idle machine (no GUI, etc.) -- if > something writes to the backing device, it flushes the dirty pages > generated by the loopback and the lockup is gone. If you add printk, don't > forget to stop klogd, otherwise logging would end the lockup. erk. > The hotfix (that I verified to work) is to not set wbc->bdi, so that all > devices are flushed ... but the code probably needs some redesign (i.e. > either account per-device and flush per-device, or account-global and > flush-global). > > Mikulas > > > diff -u -r ../x/linux-2.6.23.1/mm/page-writeback.c mm/page-writeback.c > --- ../x/linux-2.6.23.1/mm/page-writeback.c 2007-10-12 18:43:44.000000000 +0200 > +++ mm/page-writeback.c 2007-11-10 20:32:43.000000000 +0100 > @@ -214,7 +214,6 @@ > > for (;;) { > struct writeback_control wbc = { > - .bdi = bdi, > .sync_mode = WB_SYNC_NONE, > .older_than_this = NULL, > .nr_to_write = write_chunk, Arguably we just have the wrong backing-device here, and what we should do is to propagate the real backing device's pointer through up into the filesystem. There's machinery for this which things like DM stacks use. I wonder if the post-2.6.23 changes happened to make this problem go away. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/