Return-Path: Received: from mail-qt0-f174.google.com ([209.85.216.174]:35011 "EHLO mail-qt0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751401AbeBUToO (ORCPT ); Wed, 21 Feb 2018 14:44:14 -0500 Received: by mail-qt0-f174.google.com with SMTP id g14so3433846qti.2 for ; Wed, 21 Feb 2018 11:44:13 -0800 (PST) Message-ID: <1519242250.3719.9.camel@redhat.com> Subject: Re: question on kernel stack trace From: David Wysochanski To: "Mkrtchyan, Tigran" , linux-nfs Date: Wed, 21 Feb 2018 14:44:10 -0500 In-Reply-To: <1894283666.7881272.1519209992729.JavaMail.zimbra@desy.de> References: <1894283666.7881272.1519209992729.JavaMail.zimbra@desy.de> Content-Type: text/plain; charset="UTF-8" Mime-Version: 1.0 Sender: linux-nfs-owner@vger.kernel.org List-ID: On Wed, 2018-02-21 at 11:46 +0100, Mkrtchyan, Tigran wrote: > On one of our nodes we got the following stack trace: > > > Feb 19 16:55:53 nafhh kernel: updatedb      D 000000000000000f     0 > 32087  32081 0x00000080 > Feb 19 16:55:53 nafhh kernel: ffff880852d13a88 0000000000000082 > 0000000000000000 ffffffffa00bfe13 > Feb 19 16:55:53 nafhh kernel: 0000000000000050 0000000300000001 > 000d5a7576495a7a 0000000000000286 > Feb 19 16:55:53 nafhh kernel: 0000000000000286 00000001dfd51213 > ffff8810207c5068 ffff880852d13fd8 > Feb 19 16:55:53 nafhh kernel: Call Trace: > Feb 19 16:55:53 nafhh kernel: [] ? > ext4_mark_inode_dirty+0x83/0x1d0 [ext4] > Feb 19 16:55:53 nafhh kernel: [] ? > __rpc_sleep_on_priority+0xf4/0x330 [sunrpc] > Feb 19 16:55:53 nafhh kernel: [] ? > rpc_wait_bit_killable+0x0/0xa0 [sunrpc] > Feb 19 16:55:53 nafhh kernel: [] > rpc_wait_bit_killable+0x42/0xa0 [sunrpc] > Feb 19 16:55:53 nafhh kernel: [] > __wait_on_bit+0x5f/0x90 > Feb 19 16:55:53 nafhh kernel: [] ? > xprt_reserve_xprt+0x87/0x110 [sunrpc] > Feb 19 16:55:53 nafhh kernel: [] ? > rpc_wait_bit_killable+0x0/0xa0 [sunrpc] > Feb 19 16:55:53 nafhh kernel: [] > out_of_line_wait_on_bit+0x78/0x90 > Feb 19 16:55:53 nafhh kernel: [] ? > wake_bit_function+0x0/0x50 > Feb 19 16:55:53 nafhh kernel: [] > __rpc_execute+0xf5/0x350 [sunrpc] > Feb 19 16:55:53 nafhh kernel: [] ? > bit_waitqueue+0x17/0xd0 > Feb 19 16:55:53 nafhh kernel: [] > rpc_execute+0x61/0xa0 [sunrpc] > Feb 19 16:55:53 nafhh kernel: [] > rpc_run_task+0x75/0x90 [sunrpc] > Feb 19 16:55:53 nafhh kernel: [] > rpc_call_sync+0x42/0x70 [sunrpc] > Feb 19 16:55:53 nafhh kernel: [] > _nfs4_call_sync+0x3e/0x40 [nfs] > Feb 19 16:55:53 nafhh kernel: [] > _nfs4_proc_getattr+0xac/0xc0 [nfs] > Feb 19 16:55:53 nafhh kernel: [] > nfs4_proc_getattr+0x56/0x80 [nfs] > Feb 19 16:55:53 nafhh kernel: [] > __nfs_revalidate_inode+0xe3/0x220 [nfs] > Feb 19 16:55:53 nafhh kernel: [] > nfs_getattr+0xde/0x210 [nfs] > Feb 19 16:55:53 nafhh kernel: [] > vfs_getattr+0x51/0x80 > Feb 19 16:55:53 nafhh kernel: [] ? > cp_new_stat+0xe4/0x100 > Feb 19 16:55:53 nafhh kernel: [] > vfs_fstatat+0x64/0xa0 > Feb 19 16:55:53 nafhh kernel: [] > vfs_lstat+0x1e/0x20 > Feb 19 16:55:53 nafhh kernel: [] > sys_newlstat+0x24/0x50 > Feb 19 16:55:53 nafhh kernel: [] ? > system_call_after_swapgs+0x187/0x220 > Feb 19 16:55:53 nafhh kernel: [] ? > system_call_after_swapgs+0x180/0x220 > Feb 19 16:55:53 nafhh kernel: [] ? > audit_syscall_entry+0x1d7/0x200 > Feb 19 16:55:53 nafhh kernel: [] ? > system_call_after_swapgs+0x16b/0x220 > Feb 19 16:55:53 nafhh kernel: [] ? > system_call_after_swapgs+0x164/0x220 > Feb 19 16:55:53 nafhh kernel: [] ? > system_call_after_swapgs+0x15d/0x220 > Feb 19 16:55:53 nafhh kernel: [] ? > system_call_after_swapgs+0x156/0x220 > Feb 19 16:55:53 nafhh kernel: [] ? > system_call_after_swapgs+0x14f/0x220 > Feb 19 16:55:53 nafhh kernel: [] > system_call_fastpath+0x16/0x1b > Feb 19 16:55:53 nafhh kernel: [] ? > system_call_after_swapgs+0xca/0x220 > > > I am surprised, that top of the stack trace is in ext4 module. Can > some one put a light on it? > Ignore the symbols with '?' next to them as they are not reliable. If you want to map this to code use something like 'eu-addr2line' and see if the symbols make sense in terms of source code lines. This looks like the process is just blocked for a long time waiting on an NFS4 GETATTR to complete, and does not look unusual to me.