Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1422743AbWBOLLS (ORCPT ); Wed, 15 Feb 2006 06:11:18 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1422747AbWBOLLS (ORCPT ); Wed, 15 Feb 2006 06:11:18 -0500 Received: from smtp.osdl.org ([65.172.181.4]:53141 "EHLO smtp.osdl.org") by vger.kernel.org with ESMTP id S1422743AbWBOLLS (ORCPT ); Wed, 15 Feb 2006 06:11:18 -0500 Date: Wed, 15 Feb 2006 03:10:15 -0800 From: Andrew Morton To: "Ian E. Morgan" Cc: linux-kernel@vger.kernel.org, rhardy@webcon.ca Subject: Re: Process D-stated in generic_unplug_device Message-Id: <20060215031015.718103b1.akpm@osdl.org> In-Reply-To: References: X-Mailer: Sylpheed version 1.0.4 (GTK+ 1.2.10; i386-redhat-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3527 Lines: 64 "Ian E. Morgan" wrote: > > After a recent kernel upgrade (essentially 2.6.15.4 + a selection of other > patches from -git and -mm), we've had problems with proftpd getting stuck in > D-state when being shut down. There is no hot-pluging involved here (so I'm > confused about the call to generic_unplug_device). > > # /cat/proc/6012/wchan > sync_page > > A task dump shows: > > Feb 14 13:45:12 guru kernel: proftpd D 00000005 0 6012 1 29519 9328 (NOTLB) > Feb 14 13:45:12 guru kernel: da507c50 dd83b520 c03db380 00000005 c152a580 dfc74284 c15a8fc c01e84c2 > Feb 14 13:45:12 guru kernel: c15af8fc da506000 c01e84f4 c15af8fc 00004a32 9ae11a4 0000cd50 dd83b520 > Feb 14 13:45:12 guru kernel: dd83b648 da507ca8 da507cb0 c1400e20 da507c58 c03041b 00000000 c013bd28 > Feb 14 13:45:12 guru kernel: Call Trace: > Feb 14 13:45:12 guru kernel: [] __generic_unplug_device+0x1a/0x30 > Feb 14 13:45:13 guru kernel: [] generic_unplug_device+0x1c/0x36 > Feb 14 13:45:13 guru kernel: [] io_schedule+0xe/0x16 > Feb 14 13:45:13 guru kernel: [] sync_page+0x3e/0x4b > Feb 14 13:45:13 guru kernel: [] __wait_on_bit_lock+0x41/0x61 > Feb 14 13:45:13 guru kernel: [] sync_page+0x0/0x4b > Feb 14 13:45:13 guru kernel: [] __lock_page+0x91/0x99 > Feb 14 13:45:13 guru kernel: [] wake_bit_function+0x0/0x34 > Feb 14 13:45:13 guru kernel: [] wake_bit_function+0x0/0x34 > Feb 14 13:45:13 guru kernel: [] filemap_nopage+0x29a/0x377 > Feb 14 13:45:13 guru kernel: [] do_no_page+0x65/0x243 > Feb 14 13:45:13 guru kernel: [] __handle_mm_fault+0x229/0x24d > Feb 14 13:45:13 guru kernel: [] do_page_fault+0x1c2/0x5a5 > Feb 14 13:45:13 guru kernel: [] __copy_from_user_ll+0x40/0x6c > Feb 14 13:45:13 guru kernel: [] apic_timer_interrupt+0x1c/0x24 > Feb 14 13:45:13 guru kernel: [] do_page_fault+0x0/0x5a5 > Feb 14 13:45:13 guru kernel: [] error_code+0x4f/0x54 > Feb 14 13:45:13 guru kernel: [] reiserfs_copy_from_user_to_file_region+0x4d/xd8 > Feb 14 13:45:13 guru kernel: [] reiserfs_file_write+0x419/0x6ac > Feb 14 13:45:13 guru kernel: [] inode_has_perm+0x49/0x6b > Feb 14 13:45:13 guru kernel: [] prio_tree_insert+0xf3/0x187 > Feb 14 13:45:13 guru kernel: [] selinux_file_permission+0x113/0x159 > Feb 14 13:45:13 guru kernel: [] vfs_write+0xcc/0x18f > Feb 14 13:45:13 guru kernel: [] sys_write+0x4b/0x74 > Feb 14 13:45:13 guru kernel: [] syscall_call+0x7/0xb > > Obviously this process is non-killable and requires a reboot to get us back > into working order. > I assume that we're seeing some stack gunk there and it's really stuck in sync_page()->io_schedule(). This is almost certainly caused by a block layer or (more likely) device driver bug: we submitted a read I/O, we're waiting for the completion interrupt and it never came. Please test vanilla 2.6.16-rc3 and let us know what block device driver is involved, which I/O scheduler, etc. Also, try mounting the filesystems with `-o barrier=none'. That barrier code seems to have been a source of many problems. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/