Received: by 10.192.165.156 with SMTP id m28csp1753931imm; Tue, 17 Apr 2018 05:06:30 -0700 (PDT) X-Google-Smtp-Source: AIpwx48jhNmd7waFiNziTIA/lGEnSZg7I8OXowRzSfhsVSHfB1qdvYqCyu45GhA4WgR8J4mOsL1B X-Received: by 10.99.121.76 with SMTP id u73mr1587523pgc.380.1523966790108; Tue, 17 Apr 2018 05:06:30 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1523966790; cv=none; d=google.com; s=arc-20160816; b=g8UmTDhqlNouNQnWcdGuY1uWZyfPYuUs6FcKTlwH1icLccId0pTMWmM7bEztieE+Xi ftoBv08tyeRQFAg+lIlcwsjdz1GuN3pA/r9pRoaNa2y4xbvb9mtl1Hr5fZ9gCmBuJSqH CPsXImu97KFX4nk9ywyPtVOQvvRatYAPyZL6o6TAbWC349UHtnU8OIeiUEuhF2nScLmq m90jCQnHB5B3rJihJ9mH6KV8puWA/cy21xKq3w4dmjxJnd0hGYtXEyW+i4w8T5fgyodc GOij63BuojtVcUX+lXIyXY5mj6nJr5jSZHWewWULfU5AvZ5b47AY6T/klQEoRs3z4MCL DBLA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-transfer-encoding:content-disposition:mime-version :references:message-id:subject:cc:to:from:date :arc-authentication-results; bh=U6kCZh6Iz78c+DWyr4wFDvOnJpzabbCE1XmPupWUp1I=; b=nBXTkBuOJZ5vxGeY0Xvca+CeY3bSAlV08bxnN/2jRt1nVzJ5cPJRi+ZoVpotlg9dIa hTvAkC+DVR/dnrmj9Qf6dwGeiEVQfbzUgyL06mmC2w9+m4/HaUzs0sZDEAMH3zZmyWob UWi6odsWVu7qMEZiXMNKfy+lqcF/64a6TKlJbZybOMSUCYaNVFqGlmMpkc9RY1P3hJ8j xic7Z/1lMvvKNqowH7V/ct6mjN0RQO1GZI6bOKYh26e3n88XV+Yp0UiAaufdbToCDVWy exlNXhElhYZzMgByPCUWE9uW0zw1y9z6WEbQQO4JfxRK1hTJIkr1ik5XOZzngokJ+GMO q/Kg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f15si11051839pgv.47.2018.04.17.05.06.15; Tue, 17 Apr 2018 05:06:30 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753068AbeDQMEz (ORCPT + 99 others); Tue, 17 Apr 2018 08:04:55 -0400 Received: from mx2.suse.de ([195.135.220.15]:39190 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752707AbeDQMEx (ORCPT ); Tue, 17 Apr 2018 08:04:53 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay1.suse.de (charybdis-ext.suse.de [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id F387FAED5; Tue, 17 Apr 2018 12:04:51 +0000 (UTC) Received: by quack2.suse.cz (Postfix, from userid 1000) id B69231E0531; Tue, 17 Apr 2018 14:04:51 +0200 (CEST) Date: Tue, 17 Apr 2018 14:04:51 +0200 From: Jan Kara To: Pavlos Parissis Cc: Jan Kara , Guillaume Morin , stable@vger.kernel.org, decui@microsoft.com, jack@suse.com, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, mszeredi@redhat.com Subject: Re: kernel panics with 4.14.X versions Message-ID: <20180417120451.xixjctt2a4kwvpc4@quack2.suse.cz> References: <20180416132550.d25jtdntdvpy55l3@bender.morinfr.org> <20180416144041.t2mt7ugzwqr56ka3@quack2.suse.cz> <9b11cfba-4bdc-8a3e-cd33-2f7e8d513bdf@gmail.com> <134eb955-fae7-9fd0-946e-787986509d7b@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <134eb955-fae7-9fd0-946e-787986509d7b@gmail.com> User-Agent: NeoMutt/20170421 (1.8.2) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue 17-04-18 13:18:46, Pavlos Parissis wrote: > On 17/04/2018 01:31 πμ, Pavlos Parissis wrote: > > On 16/04/2018 04:40 μμ, Jan Kara wrote: > >> On Mon 16-04-18 15:25:50, Guillaume Morin wrote: > >>> Fwiw, there have been already reports of similar soft lockups in > >>> fsnotify() on 4.14: https://lkml.org/lkml/2018/3/2/1038 > >>> > >>> We have also noticed similar softlockups with 4.14.22 here. > >> > >> Yeah. > >> > >>> On 16 Apr 13:54, Pavlos Parissis wrote: > >>>> > >>>> Hi all, > >>>> > > > > [..snip..] > > > >>>> [373782.361064] watchdog: BUG: soft lockup - CPU#24 stuck for 22s! [kube-apiserver:24261] > >>>> [373782.378225] Modules linked in: binfmt_misc sctp_diag sctp dccp_diag dccp tcp_diag udp_diag > >>>> inet_diag unix_diag cfg80211 rfkill dell_rbu 8021q garp mrp xfs libcrc32c loop x86_pkg_temp_thermal > >>>> intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel > >>>> pcbc aesni_intel vfat fat crypto_simd glue_helper cryptd intel_cstate intel_rapl_perf iTCO_wdt ses > >>>> iTCO_vendor_support mxm_wmi ipmi_si dcdbas enclosure mei_me pcspkr ipmi_devintf lpc_ich sg mei > >>>> ipmi_msghandler mfd_core shpchp wmi acpi_power_meter netconsole nfsd auth_rpcgss nfs_acl lockd grace > >>>> sunrpc ip_tables ext4 mbcache jbd2 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt > >>>> fb_sys_fops sd_mod ttm crc32c_intel ahci libahci mlx5_core drm mlxfw mpt3sas ptp libata raid_class > >>>> pps_core scsi_transport_sas > >>>> [373782.516807] dm_mirror dm_region_hash dm_log dm_mod dax > >>>> [373782.531739] CPU: 24 PID: 24261 Comm: kube-apiserver Not tainted 4.14.32-1.el7.x86_64 #1 > >>>> [373782.549848] Hardware name: Dell Inc. PowerEdge R630/02C2CP, BIOS 2.4.3 01/17/2017 > >>>> [373782.567486] task: ffff882f66d28000 task.stack: ffffc9002120c000 > >>>> [373782.583441] RIP: 0010:fsnotify+0x197/0x510 > >>>> [373782.597319] RSP: 0018:ffffc9002120fdb8 EFLAGS: 00000286 ORIG_RAX: ffffffffffffff10 > >>>> [373782.615308] RAX: 0000000000000000 RBX: ffff882f9ec65c20 RCX: 0000000000000002 > >>>> [373782.632950] RDX: 0000000000028700 RSI: 0000000000000002 RDI: ffffffff8269a4e0 > >>>> [373782.650616] RBP: ffffc9002120fe98 R08: 0000000000000000 R09: 0000000000000000 > >>>> [373782.668287] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 > >>>> [373782.685918] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 > >>>> [373782.703302] FS: 000000c42009f090(0000) GS:ffff882fbf900000(0000) knlGS:0000000000000000 > >>>> [373782.721887] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > >>>> [373782.737741] CR2: 00007f82b6539244 CR3: 0000002f3de2a005 CR4: 00000000003606e0 > >>>> [373782.755247] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > >>>> [373782.772722] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > >>>> [373782.790043] Call Trace: > >>>> [373782.802041] vfs_write+0x151/0x1b0 > >>>> [373782.815081] ? syscall_trace_enter+0x1cd/0x2b0 > >>>> [373782.829175] SyS_write+0x55/0xc0 > >>>> [373782.841870] do_syscall_64+0x79/0x1b0 > >>>> [373782.855073] entry_SYSCALL_64_after_hwframe+0x3d/0xa2 > >> > >> Can you please run RIP through ./scripts/faddr2line to see where exactly > >> are we looping? I expect the loop iterating over marks to notify but better > >> be sure. > >> > > > > I am very newbie on this and I tried with: > > ../repo/Linux/linux/scripts/faddr2line ./vmlinuz-4.14.32-1.el7.x86_64 > > 0010:fsnotify+0x197/0x510 > > readelf: Error: Not an ELF file - it has the wrong magic bytes at the start > > size: ./vmlinuz-4.14.32-1.el7.x86_64: Warning: Ignoring section flag > > IMAGE_SCN_MEM_NOT_PAGED in section .bss > > nm: ./vmlinuz-4.14.32-1.el7.x86_64: Warning: Ignoring section flag > > IMAGE_SCN_MEM_NOT_PAGED in section .bss > > nm: ./vmlinuz-4.14.32-1.el7.x86_64: no symbols > > size: ./vmlinuz-4.14.32-1.el7.x86_64: Warning: Ignoring section flag > > IMAGE_SCN_MEM_NOT_PAGED in section .bss > > nm: ./vmlinuz-4.14.32-1.el7.x86_64: Warning: Ignoring section flag > > IMAGE_SCN_MEM_NOT_PAGED in section .bss > > nm: ./vmlinuz-4.14.32-1.el7.x86_64: no symbols > > no match for 0010:fsnotify+0x197/0x510 > > > > Obviously, I am doing something very wrong. > > > > I produced an uncompressed image(the error above caused by giving a > compressed image to faddr2line) by compiling 4.14.32 with config which we > have in production and now faddr2line reports: > > ../repo/Linux/linux/scripts/faddr2line ./vmlinux 0010:fsnotify+0x197/0x510 > no match for 0010:fsnotify+0x197/0x510 > > > ../repo/Linux/linux/scripts/faddr2line ./vmlinux fsnotify+0x197/0x510 > > > > skipping fsnotify address at 0xffffffff8129baf7 due to size mismatch (0x510 != 0x520) > no match for fsnotify+0x197/0x510 > > what am I doing wrong? Apparently the compiler compiled that function slightly differently than what you have in production. You need to have the original vmlinux file from the machine which crashed to be able to use faddr2line. Honza -- Jan Kara SUSE Labs, CR