From: Trond Myklebust Subject: Re: lost interrupt after a signal? Date: Thu, 22 May 2008 16:39:43 -0400 Message-ID: <1211488783.8361.8.camel@localhost> References: <2A43EAAA-8AEC-4EA1-AAA6-1AE1C750DB4C@oracle.com> Mime-Version: 1.0 Content-Type: text/plain Cc: Linux NFS Mailing List , Matthew Wilcox To: Chuck Lever Return-path: Received: from mx2.netapp.com ([216.240.18.37]:48908 "EHLO mx2.netapp.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757214AbYEVUjp (ORCPT ); Thu, 22 May 2008 16:39:45 -0400 In-Reply-To: <2A43EAAA-8AEC-4EA1-AAA6-1AE1C750DB4C@oracle.com> Sender: linux-nfs-owner@vger.kernel.org List-ID: On Thu, 2008-05-22 at 10:57 -0400, Chuck Lever wrote: > We've been running some tests to understand how the 2.6.25 "intr/ > nointr" behavior affects signal handling during I/O on NFS mounts. > > While running an Oracle database workload, we signal the database > (this is a normal way administrative tools control database > activity). Subsequently all of the I/O threads block on the inode > mutex in nfs_invalidate_mapping() except this one: > > INFO: task oracle:27214 blocked for more than 120 seconds. > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this > message. > oracle D f6d85e84 1592 27214 1 > c93d2920 00200086 00000001 f6d85e84 c04a0080 c04a0080 c04a0080 > c93d2b84 > c93d2b84 c4021f80 00000001 cc072000 f341c900 f6d85e7c 10a1a042 > f6d85e7c > cc072ddc c4021f80 03b7e000 cc072ddc c40082b4 c036e21c cc072dd4 > 00000001 > Call Trace: > [] io_schedule+0x4c/0x90 > [] sync_page+0x2c/0x40 > [] __wait_on_bit_lock+0x45/0x70 > [] sync_page+0x0/0x40 > [] __lock_page+0x73/0x80 > [] wake_bit_function+0x0/0x80 > [] invalidate_inode_pages2_range+0xb8/0x200 > [] nfs_writepages+0x68/0x90 [nfs] > [] nfs_invalidate_mapping_nolock+0x1f/0xd0 [nfs] > [] nfs_invalidate_mapping+0x5a/0x60 [nfs] > [] nfs_file_read+0x85/0x120 [nfs] > [] do_sync_read+0xd5/0x120 > [] __do_fault+0x1ca/0x400 > [] __update_rq_clock+0x27/0x180 > [] autoremove_wake_function+0x0/0x50 > [] k_getrusage+0x1f5/0x200 > [] security_file_permission+0xc/0x10 > [] rw_verify_area+0x66/0xd0 > [] getrusage+0x22/0x40 > [] vfs_read+0xa1/0x140 > [] do_sync_read+0x0/0x120 > [] sys_pread64+0x6a/0x70 > [] syscall_call+0x7/0xb > > I haven't looked too closely at this, but maybe the signal caused a > lost I/O interrupt? > > What would be the next steps to troubleshoot this further? 'cat /proc/1592/status' should tell you if there is a signal that is being blocked. -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@netapp.com www.netapp.com