Return-Path: linux-nfs-owner@vger.kernel.org Received: from mail-vc0-f177.google.com ([209.85.220.177]:41013 "EHLO mail-vc0-f177.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752507AbaJBNqA convert rfc822-to-8bit (ORCPT ); Thu, 2 Oct 2014 09:46:00 -0400 Received: by mail-vc0-f177.google.com with SMTP id hq11so1366341vcb.8 for ; Thu, 02 Oct 2014 06:45:59 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <242225503.126017.1412240463629.JavaMail.zimbra@opinsys.fi> References: <1508157147.125995.1412239112248.JavaMail.zimbra@opinsys.fi> <242225503.126017.1412240463629.JavaMail.zimbra@opinsys.fi> Date: Thu, 2 Oct 2014 09:45:59 -0400 Message-ID: Subject: Re: [RFC]: make nfs_wait_on_request() KILLABLE From: Trond Myklebust To: =?UTF-8?B?VHVvbWFzIFLDpHPDpG5lbg==?= Cc: Linux NFS Mailing List Content-Type: text/plain; charset=UTF-8 Sender: linux-nfs-owner@vger.kernel.org List-ID: On Thu, Oct 2, 2014 at 5:01 AM, Tuomas Räsänen wrote: > Hi > > Before David Jefferey's commit: > > 92a5655 nfs: Don't busy-wait on SIGKILL in __nfs_iocounter_wait > > we often experienced softlockups in our systems due to busy-looping > after SIGKILL. > > With that patch applied, the frequency of softlockups has decreased > but they are not completely gone. Now softlockups happen with > following kind of call traces: > > [] ? kvm_clock_get_cycles+0x17/0x20 > [] ? ktime_get_ts+0x48/0x140 > [] ? nfs_free_request+0x90/0x90 [nfs] > [] io_schedule+0x86/0x100 > [] nfs_wait_bit_uninterruptible+0xd/0x20 [nfs] > [] __wait_on_bit+0x51/0x70 > [] ? nfs_free_request+0x90/0x90 [nfs] > [] ? nfs_free_request+0x90/0x90 [nfs] > [] out_of_line_wait_on_bit+0x5b/0x70 > [] ? autoremove_wake_function+0x40/0x40 > [] nfs_wait_on_request+0x2e/0x30 [nfs] > [] nfs_updatepage+0x11e/0x7d0 [nfs] > [] ? nfs_page_find_request+0x3b/0x50 [nfs] > [] ? nfs_flush_incompatible+0x6d/0xe0 [nfs] > [] nfs_write_end+0x110/0x280 [nfs] > [] ? kmap_atomic_prot+0xe2/0x100 > [] ? __kunmap_atomic+0x63/0x80 > [] generic_file_buffered_write+0x132/0x210 > [] __generic_file_aio_write+0x25d/0x460 > [] ? __nfs_revalidate_inode+0x102/0x2e0 [nfs] > [] generic_file_aio_write+0x53/0x90 > [] nfs_file_write+0xa7/0x1d0 [nfs] > [] ? common_file_perm+0x4b/0xe0 > [] do_sync_write+0x57/0x90 > [] ? do_sync_readv_writev+0x80/0x80 > [] vfs_write+0x95/0x1b0 > [] SyS_write+0x49/0x90 > [] syscall_call+0x7/0x7 > [] ? balance_dirty_pages.isra.18+0x390/0x4c3 > > As I understand it, there are some outstanding requests going on which > nfs_wait_on_request() is waiting for. For some reason, they are not > finished in timely manner and the process is eventually killed with Why are those outstanding requests not completing, and why would killing the tasks that are waiting for that completion help? > SIGKILL by admin. However, nfs_wait_on_request() has set the task > state TASK_UNINTERRUPTIBLE and it does not get killed. > > Why nfs_wait_on_request() is UNINTERRUPTIBLE instead of KILLABLE? Please see the changelog entry in https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=9f557cd80731 Cheers Trond -- Trond Myklebust Linux NFS client maintainer, PrimaryData trond.myklebust@primarydata.com