Received: by 10.192.165.156 with SMTP id m28csp1216205imm; Mon, 16 Apr 2018 16:33:09 -0700 (PDT) X-Google-Smtp-Source: AIpwx49+Eo9MvEDYXEnoYBn+SXTnickNDlurqieWoG6vo9fMRne7QA1E7yW2JpRpUlun1E9o1+a0 X-Received: by 10.99.117.26 with SMTP id q26mr13801160pgc.338.1523921589096; Mon, 16 Apr 2018 16:33:09 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1523921589; cv=none; d=google.com; s=arc-20160816; b=mdGa2d82niy9n6Ob7UC83NSxBqSIVNV7Z0hiE9QcdF5NBP4odg635Ii7F6JamL6Sza C2FASxGHtIDw7MyEGberbwLvNZGt/cVHAHex6Iy1PgBUvnrZpCXAzT8TDTez2Xn0F5tV 3v8x5JJkdQ7zvSzL9dpjHeSK3FOCUESCa7iZqEXTa+BkdHEmxAYCWV2XGVHN1Sj3wUTP 6DEsLyZcHZhL3GFdO5Pp0n9oVaRBBojkzwp9WOtMehm+BzcGgp2pSm1Nrj1WADtT74hY Fu8FrJXKQXOf03xuWs/FigYYBIpeh3Ulo0yAoff4vjobgkFqU/CP0eZurhE7TtX2Snp+ UepA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:dkim-signature :arc-authentication-results; bh=xglZNu8oAEUrdF5dvXIdhobuQRStwb7z2whyhZ+bnWU=; b=GV+MiSkXHcyzUjJEp3gxpf29AXMLyyPRnymm/w3zBPO4zOeiGUoaM9ozIFJVFDHZRq pndvb1vNbXzL1kFbWBM3wPde4iNknGdQf+SNh26JvzYGKkQzCFgRySNrTHEaie/Ync7A v/rKwlL9OSvLZO64yBOddiiAbP/4gwbg5lW2tX42LEa9IKjyeUq4bY+UadGq079yxSky qQFBs/g+4NFNZtbGlmww8//XQjn4uEI4jGApzZiQ6aUiPH6J1s1ChBYo54u7TDMqVUaK ZSThtpbugzXZBfGJ7jMhuXbKNsmditiIQeIHaHvwjPhJhkU02TlfYEpsApn73wO0d5J1 tp+A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=drcE15vp; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id b184si9118433pgc.25.2018.04.16.16.32.55; Mon, 16 Apr 2018 16:33:09 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=drcE15vp; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751893AbeDPXbt (ORCPT + 99 others); Mon, 16 Apr 2018 19:31:49 -0400 Received: from mail-wr0-f181.google.com ([209.85.128.181]:42293 "EHLO mail-wr0-f181.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750979AbeDPXbr (ORCPT ); Mon, 16 Apr 2018 19:31:47 -0400 Received: by mail-wr0-f181.google.com with SMTP id s18so30444840wrg.9; Mon, 16 Apr 2018 16:31:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to; bh=xglZNu8oAEUrdF5dvXIdhobuQRStwb7z2whyhZ+bnWU=; b=drcE15vpVW5DXisW5ABCLxkx6fq+JvQ2/ErO7CSqKmY9UivJVw0FObR0ev1JB/F8wU TOcGwF8zCyrVc5esnIVF2BELRT35PsyT6L+nqKN4+WSR/i4iPPdmgOfi6LFYAYxmiIKX bEjp0o6pIzonUNgtFYuRBDHMwpvD3mp33YxWX+2Hp5mh83lpO4rvMP5uU0wKwt8uJ2/h sQT3nj9lL78jl3hWasKd3GFTfBtCd8CFM6KUV0MkCx+HEXhCqPP5kneOklgc9OJSEA8C 0/I8cXgM2Qup+7yk1WN/TXikgv0e8HZ1q5CimnOs9TSZY1URtZSYKztB/xh4WBtxiIGF psFQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to; bh=xglZNu8oAEUrdF5dvXIdhobuQRStwb7z2whyhZ+bnWU=; b=H57Mimt6oObCgvVBzsV3gUGSgdDBzPh1P+lCfBLSKbYaChPYT8GDSjL4F1CSvw2T8F Bz3JeyavZwYBqhGmNFJYOBDMSvYlnDBE8ZBMSFFdWqm377SGbnta+udKa2aOz6td0FIB NoEwBfKwqCCZizFnXGVZavJ3CVl0DzCLxyvJdS+i+S5TaRsdNoV+nWDfcmXKtLGL0Oz0 nyf3qZQHMz/J7zJREcjkww92+2D4y9t6Dfk9oja4XfzWtB68nYYsKypjbut6nxFBAGAA TBb9DM8p/1yjzJfod80OUvEfqjrX9TjS39F1phlh61E1MObNfkBt2E4q0LglilSeP5HH Mivw== X-Gm-Message-State: ALQs6tD835ukutOJ5cFpgOA/qvHSm71fSYJtBCgYU5yVxFmxKCrp+2YQ 02rW//5iYPZbqxGsDtplsAQ= X-Received: by 10.223.224.203 with SMTP id e11mr1898095wri.2.1523921505318; Mon, 16 Apr 2018 16:31:45 -0700 (PDT) Received: from [192.168.0.103] (807f27e8.ftth.concepts.nl. [128.127.39.232]) by smtp.googlemail.com with ESMTPSA id r200sm22115762wmb.39.2018.04.16.16.31.44 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 16 Apr 2018 16:31:44 -0700 (PDT) Subject: Re: kernel panics with 4.14.X versions To: Jan Kara , Guillaume Morin Cc: stable@vger.kernel.org, decui@microsoft.com, jack@suse.com, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, mszeredi@redhat.com References: <20180416132550.d25jtdntdvpy55l3@bender.morinfr.org> <20180416144041.t2mt7ugzwqr56ka3@quack2.suse.cz> From: Pavlos Parissis Message-ID: <9b11cfba-4bdc-8a3e-cd33-2f7e8d513bdf@gmail.com> Date: Tue, 17 Apr 2018 01:31:24 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.6.0 MIME-Version: 1.0 In-Reply-To: <20180416144041.t2mt7ugzwqr56ka3@quack2.suse.cz> Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="2ghRghmoEZzwAlKv35wIbo0E3KXBNuMWB" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --2ghRghmoEZzwAlKv35wIbo0E3KXBNuMWB Content-Type: multipart/mixed; boundary="9FDS8MZPRlFqK18ZMb9Uy5gipiGLmYfiQ"; protected-headers="v1" From: Pavlos Parissis To: Jan Kara , Guillaume Morin Cc: stable@vger.kernel.org, decui@microsoft.com, jack@suse.com, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, mszeredi@redhat.com Message-ID: <9b11cfba-4bdc-8a3e-cd33-2f7e8d513bdf@gmail.com> Subject: Re: kernel panics with 4.14.X versions References: <20180416132550.d25jtdntdvpy55l3@bender.morinfr.org> <20180416144041.t2mt7ugzwqr56ka3@quack2.suse.cz> In-Reply-To: <20180416144041.t2mt7ugzwqr56ka3@quack2.suse.cz> --9FDS8MZPRlFqK18ZMb9Uy5gipiGLmYfiQ Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: quoted-printable On 16/04/2018 04:40 =CE=BC=CE=BC, Jan Kara wrote: > On Mon 16-04-18 15:25:50, Guillaume Morin wrote: >> Fwiw, there have been already reports of similar soft lockups in >> fsnotify() on 4.14: https://lkml.org/lkml/2018/3/2/1038 >> >> We have also noticed similar softlockups with 4.14.22 here. >=20 > Yeah. > =20 >> On 16 Apr 13:54, Pavlos Parissis wrote: >>> >>> Hi all, >>> [..snip..] >>> [373782.361064] watchdog: BUG: soft lockup - CPU#24 stuck for 22s! [k= ube-apiserver:24261] >>> [373782.378225] Modules linked in: binfmt_misc sctp_diag sctp dccp_di= ag dccp tcp_diag udp_diag >>> inet_diag unix_diag cfg80211 rfkill dell_rbu 8021q garp mrp xfs libcr= c32c loop x86_pkg_temp_thermal >>> intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul cr= c32_pclmul ghash_clmulni_intel >>> pcbc aesni_intel vfat fat crypto_simd glue_helper cryptd intel_cstate= intel_rapl_perf iTCO_wdt ses >>> iTCO_vendor_support mxm_wmi ipmi_si dcdbas enclosure mei_me pcspkr ip= mi_devintf lpc_ich sg mei >>> ipmi_msghandler mfd_core shpchp wmi acpi_power_meter netconsole nfsd = auth_rpcgss nfs_acl lockd grace >>> sunrpc ip_tables ext4 mbcache jbd2 i2c_algo_bit drm_kms_helper syscop= yarea sysfillrect sysimgblt >>> fb_sys_fops sd_mod ttm crc32c_intel ahci libahci mlx5_core drm mlxfw = mpt3sas ptp libata raid_class >>> pps_core scsi_transport_sas >>> [373782.516807] dm_mirror dm_region_hash dm_log dm_mod dax >>> [373782.531739] CPU: 24 PID: 24261 Comm: kube-apiserver Not tainted 4= =2E14.32-1.el7.x86_64 #1 >>> [373782.549848] Hardware name: Dell Inc. PowerEdge R630/02C2CP, BIOS = 2.4.3 01/17/2017 >>> [373782.567486] task: ffff882f66d28000 task.stack: ffffc9002120c000 >>> [373782.583441] RIP: 0010:fsnotify+0x197/0x510 >>> [373782.597319] RSP: 0018:ffffc9002120fdb8 EFLAGS: 00000286 ORIG_RAX:= ffffffffffffff10 >>> [373782.615308] RAX: 0000000000000000 RBX: ffff882f9ec65c20 RCX: 0000= 000000000002 >>> [373782.632950] RDX: 0000000000028700 RSI: 0000000000000002 RDI: ffff= ffff8269a4e0 >>> [373782.650616] RBP: ffffc9002120fe98 R08: 0000000000000000 R09: 0000= 000000000000 >>> [373782.668287] R10: 0000000000000000 R11: 0000000000000000 R12: 0000= 000000000000 >>> [373782.685918] R13: 0000000000000000 R14: 0000000000000000 R15: 0000= 000000000000 >>> [373782.703302] FS: 000000c42009f090(0000) GS:ffff882fbf900000(0000)= knlGS:0000000000000000 >>> [373782.721887] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>> [373782.737741] CR2: 00007f82b6539244 CR3: 0000002f3de2a005 CR4: 0000= 0000003606e0 >>> [373782.755247] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000= 000000000000 >>> [373782.772722] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000= 000000000400 >>> [373782.790043] Call Trace: >>> [373782.802041] vfs_write+0x151/0x1b0 >>> [373782.815081] ? syscall_trace_enter+0x1cd/0x2b0 >>> [373782.829175] SyS_write+0x55/0xc0 >>> [373782.841870] do_syscall_64+0x79/0x1b0 >>> [373782.855073] entry_SYSCALL_64_after_hwframe+0x3d/0xa2 >=20 > Can you please run RIP through ./scripts/faddr2line to see where exactl= y > are we looping? I expect the loop iterating over marks to notify but be= tter > be sure. >=20 I am very newbie on this and I tried with: ../repo/Linux/linux/scripts/faddr2line ./vmlinuz-4.14.32-1.el7.x86_64 0010:fsnotify+0x197/0x510 readelf: Error: Not an ELF file - it has the wrong magic bytes at the sta= rt size: ./vmlinuz-4.14.32-1.el7.x86_64: Warning: Ignoring section flag IMAGE_SCN_MEM_NOT_PAGED in section .bss nm: ./vmlinuz-4.14.32-1.el7.x86_64: Warning: Ignoring section flag IMAGE_SCN_MEM_NOT_PAGED in section .bss nm: ./vmlinuz-4.14.32-1.el7.x86_64: no symbols size: ./vmlinuz-4.14.32-1.el7.x86_64: Warning: Ignoring section flag IMAGE_SCN_MEM_NOT_PAGED in section .bss nm: ./vmlinuz-4.14.32-1.el7.x86_64: Warning: Ignoring section flag IMAGE_SCN_MEM_NOT_PAGED in section .bss nm: ./vmlinuz-4.14.32-1.el7.x86_64: no symbols no match for 0010:fsnotify+0x197/0x510 Obviously, I am doing something very wrong. > How easily can you hit this? Very easily, I only need to wait 1-2 days for a crash to occur. > Are you able to run debug kernels Well, I was under the impression I do as I have: grep -E 'DEBUG_KERNEL|DEBUG_INFO' /boot/config-4.14.32-1.el7.x86_64 CONFIG_DEBUG_INFO=3Dy # CONFIG_DEBUG_INFO_REDUCED is not set # CONFIG_DEBUG_INFO_SPLIT is not set # CONFIG_DEBUG_INFO_DWARF4 is not set CONFIG_DEBUG_KERNEL=3Dy Do you think that my kernel doesn't produce a proper crash dump? I have a production cluster where I can run any kernel we need, so if I n= eed to compile again with different settings I can certainly do that. > / inspect > crash dumps when the issue occurs? I can't do that as the server isn't responsive and I can only power cycle= it. > Also testing with the latest mainline > kernel (4.16) would be welcome whether this isn't just an issue with th= e > backport of fsnotify fixes from Miklos. I can try the kernel-ml-4.16.2 from elrepo (we use CentOS 7). Thanks a lot for your reply. Pavlos Parissis --9FDS8MZPRlFqK18ZMb9Uy5gipiGLmYfiQ-- --2ghRghmoEZzwAlKv35wIbo0E3KXBNuMWB Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAEBCAAGBQJa1TJeAAoJEIP8ktofcXa58e8QALlQnsDP7m+SH21eGupW/W9Z 1EZdDJunt6tqawTVMqdOUMHzLIsi21B0Q6buEgH325UoWB4gAUOUbKELaVCO3VDK LipGV7W0baWt1mWGNK2GlMC4JG7tUsb/7oBwjWt49Cuv8eDkfMloejTbXD4Xdftq HaHmd6U0tKHIjEqplBqG8EejNnDkeb+iVTZqnTj1P7FMJhOBwieFFASH+YdWMzeV tgFjr4JHUFs+WTgFLtAvp7lgMBYmm8B8wuqicsrkDtAClwj0A1tmyAUDJ2lcejSz bWeRnZv8YAV17D9HHwRburXvhEMOFGbcI0KEU6OafWmghadlIATSikZPQngwIQ1V lYeO3kJ+UgNSdr95kqmVMnsizSRF7I6B078YBT9B38nrJaTD52UsO9IXBr4aEFQk GgTr9B+I7OpKNNVDWzrEDeFnjsNo3ROy8qcmAAL4F7BAoKIFqF3YIyChfGUidvNH dUvSXFC9uBSG/IEwZqh4euskILKxvuVvd4xgrA4h3di0ZezaBR2vL9dSO+i06jFv YTubMXwsZvxLD2lWvyo0TCnI7VsmvgL6pSEnbOQz9jl3Vaa8Y7c33Pf6YtDqHzba 9eeqbXbROP/8DrrWKXTZ1m2DUxPz1EpKMR4uQ8MAxFRz3kqydTkSGU39Qz58u+Y2 VrlzNoiTFxzqK/QDQ5AV =n09n -----END PGP SIGNATURE----- --2ghRghmoEZzwAlKv35wIbo0E3KXBNuMWB--