Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754551AbXKJTvl (ORCPT ); Sat, 10 Nov 2007 14:51:41 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1753476AbXKJTvd (ORCPT ); Sat, 10 Nov 2007 14:51:33 -0500 Received: from artax.karlin.mff.cuni.cz ([195.113.31.125]:57539 "EHLO artax.karlin.mff.cuni.cz" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753347AbXKJTvc (ORCPT ); Sat, 10 Nov 2007 14:51:32 -0500 Date: Sat, 10 Nov 2007 20:51:31 +0100 (CET) From: Mikulas Patocka To: linux-kernel@vger.kernel.org Subject: Temporary lockup on loopback block device Message-ID: X-Personality-Disorder: Schizoid MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2474 Lines: 65 Hi I am experiencing a transient lockup in 'D' state with loopback device. It happens when process writes to a filesystem in loopback with command like dd if=/dev/zero of=/s/fill bs=4k CPU is idle, disk is idle too, yet the dd process is waiting in 'D' in congestion_wait called from balance_dirty_pages. After about 30 seconds, the lockup is gone and dd resumes, but it locks up soon again. I added a printk to the balance_dirty_pages printk("wait: nr_reclaimable %d, nr_writeback %d, dirty_thresh %d, pages_written %d, write_chunk %d\n", nr_reclaimable, global_page_state(NR_WRITEBACK), dirty_thresh, pages_written, write_chunk); and it shows this during the lockup: wait: nr_reclaimable 3099, nr_writeback 0, dirty_thresh 2985, pages_written 1021, write_chunk 1522 wait: nr_reclaimable 3099, nr_writeback 0, dirty_thresh 2985, pages_written 1021, write_chunk 1522 wait: nr_reclaimable 3099, nr_writeback 0, dirty_thresh 2985, pages_written 1021, write_chunk 1522 What apparently happens: writeback_inodes syncs inodes only on the given wbc->bdi, however balance_dirty_pages checks against global counts of dirty pages. So if there's nothing to sync on a given device, but there are other dirty pages so that the counts are over the limit, it will loop without doing any work. To reproduce it, you need totally idle machine (no GUI, etc.) -- if something writes to the backing device, it flushes the dirty pages generated by the loopback and the lockup is gone. If you add printk, don't forget to stop klogd, otherwise logging would end the lockup. The hotfix (that I verified to work) is to not set wbc->bdi, so that all devices are flushed ... but the code probably needs some redesign (i.e. either account per-device and flush per-device, or account-global and flush-global). Mikulas diff -u -r ../x/linux-2.6.23.1/mm/page-writeback.c mm/page-writeback.c --- ../x/linux-2.6.23.1/mm/page-writeback.c 2007-10-12 18:43:44.000000000 +0200 +++ mm/page-writeback.c 2007-11-10 20:32:43.000000000 +0100 @@ -214,7 +214,6 @@ for (;;) { struct writeback_control wbc = { - .bdi = bdi, .sync_mode = WB_SYNC_NONE, .older_than_this = NULL, .nr_to_write = write_chunk, - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/