Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp1044655imm; Mon, 9 Jul 2018 16:04:11 -0700 (PDT) X-Google-Smtp-Source: AAOMgpeE3fSD/zvS4/NgplXMS7LEO2csE1JwUcrLg/kTSju2O/zOEozZq/UXuTSHRDZ2j4JB+lES X-Received: by 2002:a65:468e:: with SMTP id h14-v6mr19776197pgr.89.1531177450931; Mon, 09 Jul 2018 16:04:10 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1531177450; cv=none; d=google.com; s=arc-20160816; b=ASvO6NfZBVV60ac9b+uB/cXMow/BsuY2fDUfSuYGCRMGHQK5LWw/9RUNhxiNQqV8wB FYkScmKe+CER0yMQ4fNecxNfD1W4Bycy3f5IG369/GXjzdpVOMBVrIJrf+wkS3CC6Iaj TSxzF9MMKk7ce9TCyV9i8pbHxbiHefCDs+t0W0VW8AjmlTa7ZmIRTDIFp19xvwdnHBwB 6Sk9i23+qQ8adhwfFqxRmewNE0QZz5NMEIpbfF0/DSyWJT7AlxJGx5mpPS+T0mR4Nlq3 OlKyjBOquAXoweYptoNB/z2TIGMeSN/YU2lA89pPikJqKYC4+5rjEtUv2S95ApektpjF nc/g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:dkim-signature :arc-authentication-results; bh=pB9eGs0bRfqc2jGZyplXvSRA1wE4ejm1UrD0KLNRLOg=; b=vzO7N+FOm2pwsSiM3saqn6LPzPzCIUH48+6ZX1dp3aNzWJEa2gnfVDUFSAIvuqxvv9 /BTVexIdoCbmZIGWFhtdeQnnoTogn4u0jwBKchmCWsrA7GnY+JG/pbafmKPeRe1/pCy3 QdrCShcKu+L4IQJZJaF2W5UFsVubjnFWC7rCUy8veHYx6jwKMh6wLM4ZsxQYrUeNjTCh aylyebLXAYp4MaE6SL8G/RkQJqiHePq20BfPt9sVPKwpFfxx6oj9H70RDc1apeWlfrtW FJBtcicYHa3zTgwQghxh8IrL8/cw2iymNI8c/8fhyUCn+fAIq0y9FVkLyLcD+flFuBce VXOw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=rFZKlLLN; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id l68-v6si14344803pgl.84.2018.07.09.16.03.55; Mon, 09 Jul 2018 16:04:10 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=rFZKlLLN; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933206AbeGIXCf (ORCPT + 99 others); Mon, 9 Jul 2018 19:02:35 -0400 Received: from mail-oi0-f65.google.com ([209.85.218.65]:33174 "EHLO mail-oi0-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754406AbeGIXCc (ORCPT ); Mon, 9 Jul 2018 19:02:32 -0400 Received: by mail-oi0-f65.google.com with SMTP id c6-v6so39108756oiy.0 for ; Mon, 09 Jul 2018 16:02:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=pB9eGs0bRfqc2jGZyplXvSRA1wE4ejm1UrD0KLNRLOg=; b=rFZKlLLNFBQ6AF42oQqqrYV1YP7Vhf/rjuWHrNQAtMzo7wsdTYT6RkWeCLxRXFZA6D LwoX1ib9HODjP3KeYqWJ3x7pyeX1xB0+qZ6keiKK6j0INYtdSvptdPsjaRkHTCQ3LkjR fFyNWd7m24uIEsZF6VDzYORu+ZDYddbaMEjZEHYcQ8pJiu2K6T/amNBHqSE7412zwWRl kgPS0g/WMMAyYRd84kUlPPskKp3hpiYUtY3bsuIuMilcfRo9MCeV4ZC0hJLRXhKwjrhQ JezIL/Vzwos4JTiwwbhNBfzRaM314iqMpO3WYNlDKnb5LMFvjWV01l1qRlbYXzyGz+Te dSJQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=pB9eGs0bRfqc2jGZyplXvSRA1wE4ejm1UrD0KLNRLOg=; b=lvPnfefV1nne5dwI1wVqMgVVRqM93hJYcGT4LrGWWKN2FdGM1uBcOLEieowK6z6MVy OnHsORQBGR5VooHPSSn5RJBg4kFOYvvAKLTNd+2i3VHLEQShSglo6mZg/kpB8hOJGy03 pHownwsP9EkD5ouPWCuMqdhsmbyC/iQGBk9thnetuVFqYY6xpUcMuT4mkrP1gBiZWYn6 p38DrThQ4ESLMdrBaAOagE1DUO88kACeKm6ueRET/4QyWR+2t65arcJEcBfDUoNSO+Wh GjdSSDUmMo34RGsuDL+Qz19AjDoT/wVMwOF9V7pR1My75BF8v50UK5I/SARxBlaanr20 q3vA== X-Gm-Message-State: APt69E1csSvUbv7ETyOcGvazf1A/I+MHUSMLH3aM9vcj7SjaLOsrItai aLUe88wJK8UY0zOiLkGwEnXfyP01Bzv5vIME04A= X-Received: by 2002:aca:5754:: with SMTP id l81-v6mr26942628oib.100.1531177351554; Mon, 09 Jul 2018 16:02:31 -0700 (PDT) MIME-Version: 1.0 Received: by 2002:a4a:c984:0:0:0:0:0 with HTTP; Mon, 9 Jul 2018 16:02:30 -0700 (PDT) In-Reply-To: <50d6bb50-5fa4-33d1-1f88-3844d0237f16@intel.com> References: <50d6bb50-5fa4-33d1-1f88-3844d0237f16@intel.com> From: "H.J. Lu" Date: Mon, 9 Jul 2018 16:02:30 -0700 Message-ID: Subject: Re: Kernel 4.17.4 lockup To: Dave Hansen Cc: "H. Peter Anvin" , Matthew Wilcox , LKML Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Jul 9, 2018 at 7:54 AM, Dave Hansen wrote: > On 07/09/2018 06:19 AM, Lu, Hongjiu wrote: >> On 3 x86-64 machines, kernel 4.17.4 locked up under heavy load. 2 of > them don't have any kernel messages. One has > > Hi H.J., > > It'd be really handy if you could pastebin things like this, or attach a > text file with the oops. Your email wrapped the heck out of the oops and > I had to go and unwrap it to read it. > > A full disassembly of free_pages_and_swap_cache() from the actual > vmlinux to account for differences between toolchains would be helpful. > It'll probably help me figure out what the loop counter was for > instance. It makes it a bit easier to read random oopses if you boot > with 'nokaslr' because the pointer types (kernel text, linear map, > vmemmap, etc...) stick out much more easily. Would you be able to boot > with it in the future? I will do that if it happens again. > We've had a bit of churn in that code, but nothing between 4.16 and 4.17 > that really sticks out to me in the x86 code. > > The general protection fault is a bit of an oddball. If I disassembled > right, it's trying to dereference %R13+20. That doesn't even cross a This is correct. > page boundary, so it's a bit hard to fathom where the #GP would come from. (gdb) disass free_pages_and_swap_cache Dump of assembler code for function free_pages_and_swap_cache: 0xffffffff8124c0d0 <+0>: callq 0xffffffff81a017a0 <__fentry__> 0xffffffff8124c0d5 <+5>: push %r14 0xffffffff8124c0d7 <+7>: push %r13 0xffffffff8124c0d9 <+9>: push %r12 0xffffffff8124c0db <+11>: mov %rdi,%r12 0xffffffff8124c0de <+14>: push %rbp 0xffffffff8124c0df <+15>: mov %esi,%ebp 0xffffffff8124c0e1 <+17>: push %rbx 0xffffffff8124c0e2 <+18>: callq 0xffffffff81205a10 0xffffffff8124c0e7 <+23>: test %ebp,%ebp 0xffffffff8124c0e9 <+25>: jle 0xffffffff8124c156 0xffffffff8124c0eb <+27>: lea -0x1(%rbp),%eax 0xffffffff8124c0ee <+30>: mov %r12,%rbx 0xffffffff8124c0f1 <+33>: lea 0x8(%r12,%rax,8),%r14 0xffffffff8124c0f6 <+38>: mov (%rbx),%r13 0xffffffff8124c0f9 <+41>: mov 0x20(%r13),%rdx <<<<<<<<<<<<<<<<<<<< GPF here. 0xffffffff8124c0fd <+45>: lea -0x1(%rdx),%rax 0xffffffff8124c101 <+49>: and $0x1,%edx 0xffffffff8124c104 <+52>: cmove %r13,%rax 0xffffffff8124c108 <+56>: mov 0x20(%rax),%rcx 0xffffffff8124c10c <+60>: lea -0x1(%rcx),%rdx 0xffffffff8124c110 <+64>: and $0x1,%ecx 0xffffffff8124c113 <+67>: cmove %rax,%rdx 0xffffffff8124c117 <+71>: mov (%rdx),%rdx 0xffffffff8124c11a <+74>: test $0x40000,%edx 0xffffffff8124c120 <+80>: je 0xffffffff8124c14d 0xffffffff8124c122 <+82>: mov (%rax),%rax 0xffffffff8124c125 <+85>: test $0x2,%ah 0xffffffff8124c128 <+88>: je 0xffffffff8124c14d 0xffffffff8124c12a <+90>: mov %r13,%rdi 0xffffffff8124c12d <+93>: callq 0xffffffff81218260 0xffffffff8124c132 <+98>: test %al,%al 0xffffffff8124c134 <+100>: jne 0xffffffff8124c14d 0xffffffff8124c136 <+102>: mov 0x20(%r13),%rdx 0xffffffff8124c13a <+106>: lea -0x1(%rdx),%rax 0xffffffff8124c13e <+110>: and $0x1,%edx 0xffffffff8124c141 <+113>: cmove %r13,%rax 0xffffffff8124c145 <+117>: lock btsq $0x0,(%rax) 0xffffffff8124c14b <+123>: jae 0xffffffff8124c168 0xffffffff8124c14d <+125>: add $0x8,%rbx 0xffffffff8124c151 <+129>: cmp %rbx,%r14 0xffffffff8124c154 <+132>: jne 0xffffffff8124c0f6 0xffffffff8124c156 <+134>: pop %rbx 0xffffffff8124c157 <+135>: mov %ebp,%esi 0xffffffff8124c159 <+137>: mov %r12,%rdi 0xffffffff8124c15c <+140>: pop %rbp 0xffffffff8124c15d <+141>: pop %r12 0xffffffff8124c15f <+143>: pop %r13 0xffffffff8124c161 <+145>: pop %r14 0xffffffff8124c163 <+147>: jmpq 0xffffffff81204600 0xffffffff8124c168 <+152>: mov %r13,%rdi 0xffffffff8124c16b <+155>: callq 0xffffffff81250aa0 0xffffffff8124c170 <+160>: mov %r13,%rdi 0xffffffff8124c173 <+163>: callq 0xffffffff811ed140 0xffffffff8124c178 <+168>: jmp 0xffffffff8124c14d End of assembler dump. (gdb) > (mostly) unwrapped oops below. > >>> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: general protection >>> fault: 0000 [#1] SMP PTI >>> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: Modules linked in: >>> rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache devlink ebtable_filter >>> ebtables ip6table_filter ip6_tables intel_rapl x86_pkg_temp_thermal >>> intel_powerclamp coretemp snd_hda_codec_hdmi snd_hda_codec_realtek >>> kvm_intel snd_hda_codec_generic snd_hda_intel kvm snd_hda_codec >>> snd_hda_core snd_hwdep irqbypass crct10dif_pclmul crc32_pclmul snd_seq >>> mei_wdt ghash_clmulni_intel snd_seq_device intel_cstate ppdev >>> intel_uncore iTCO_wdt gpio_ich iTCO_vendor_support snd_pcm >>> intel_rapl_perf snd_timer snd mei_me parport_pc joydev i2c_i801 mei >>> soundcore shpchp lpc_ich parport nfsd auth_rpcgss nfs_acl lockd grace >>> sunrpc i915 i2c_algo_bit drm_kms_helper r8169 drm crc32c_intel mii >>> video >>> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: CPU: 7 PID: 7093 Comm: >>> cc1 Not tainted 4.17.4-200.0.fc28.x86_64 #1 >>> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: Hardware name: Gigabyte >>> Technology Co., Ltd. H87M-D3H/H87M-D3H, BIOS F11 08/18/2015 >>> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: RIP: 0010:free_pages_and_swap_cache+0x29/0xb0 >>> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: RSP: 0018:ffffb2cd83ffbd58 EFLAGS: 00010202 >>> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: RAX: 0017fffe00040068 RBX: ffff93d4abb5ec80 RCX: 0000000000000000 >>> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: RDX: 0017fffe00040068 RSI: 00000000000001fe RDI: ffff93d51e3dd2a0 >>> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: RBP: 00000000000001fe R08: fffff0809df82d20 R09: ffff93d51e5d5000 >>> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: R10: ffff93d51e5d5e20 R11: ffff93d51e5d5d00 R12: ffff93d4abb5e010 >>> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: R13: fffbf0809e304bc0 R14: ffff93d4abb5f000 R15: ffff93d4cbcee8f0 >>> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: FS: 0000000000000000(0000) GS:ffff93d51e3c0000(0000) knlGS:0000000000000000 >>> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: CR2: 00007ffb255e753c CR3: 00000005e820a002 CR4: 00000000001606e0 >>> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: Call Trace: >>> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: tlb_flush_mmu_free+0x31/0x50 >>> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: arch_tlb_finish_mmu+0x42/0x70 >>> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: tlb_finish_mmu+0x1f/0x30 >>> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: exit_mmap+0xca/0x190 >>> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: mmput+0x5f/0x130 >>> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: do_exit+0x280/0xae0 >>> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: ? __do_page_fault+0x263/0x4e0 >>> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: do_group_exit+0x3a/0xa0 >>> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: __x64_sys_exit_group+0x14/0x20 >>> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: do_syscall_64+0x65/0x160 >>> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9 >>> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: RIP: 0033:0x7ffb2542b3c6 >>> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: RSP: 002b:00007ffd9e7e33b8 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7 >>> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: RAX: ffffffffffffffda RBX: 00007ffb2551c740 RCX: 00007ffb2542b3c6 >>> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: RDX: 0000000000000000 RSI: 000000000000003c RDI: 0000000000000000 >>> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: RBP: 0000000000000000 R08: 00000000000000e7 R09: fffffffffffffe70 >>> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: R10: 00007ffd9e7e3250 R11: 0000000000000246 R12: 00007ffb2551c740 >>> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: R13: 0000000000000037 R14: 00007ffb25525708 R15: 0000000000000000 >>> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: Code: 40 00 0f 1f 44 00 00 41 56 41 55 41 54 49 89 fc 55 89 f5 53 e8 29 99 fb ff 85 ed 7e 6b 8d 45 ff 4c 89 e3 4d 8d 74 c4 08 4c 8b 2b <49> 8b 55 20 48 8d 42 ff 83 e2 01 49 0f 44 c5 48 8b 48 20 48 8d >>> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: RIP: free_pages_and_swap_cache+0x29/0xb0 RSP: ffffb2cd83ffbd58 >>> Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: ---[ end trace 5960277fd8a3c0b5 ]--- -- H.J.