Received: by 10.223.176.46 with SMTP id f43csp609245wra; Sat, 20 Jan 2018 01:08:33 -0800 (PST) X-Google-Smtp-Source: AH8x225PlyA9NTJDlyabUBUWDn0RGUU7VqB/u5qiDw+Gp3oGF8KcZC5+uqvrvgliCTLuaDYjYdvm X-Received: by 10.99.65.70 with SMTP id o67mr1411177pga.348.1516439313031; Sat, 20 Jan 2018 01:08:33 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1516439312; cv=none; d=google.com; s=arc-20160816; b=ZG1fNaWkP0F5Fd8nRHMxQPIDuW6aufC8XgIT8DDrHnrmkToEuqbcCoXScwcYymE5dX vUyvI6S9hBQXHfO+z0PEXeGHjE/K/gt2v0qn4M/yymVNkwomBuiN92mIXPLCWdJVAZhz 3JFyVLH7NM1QaIgsVOkNPPbzUrY7dg0i0fgoW+AfgLLKzdbYMv3eVvrh9NU5PdvbBOQp Mz1L30YN+/K30mYi1SlMUQbUVzVvwsv2/ZW2ILxW1K9BjodBH7ixMk0DvejRfxaMQ6pl rm8vYRZy1JNq/SCvdE5icPGTKG43kc+wgumnHFNsymNQ4HaWP7MLZefrP5dSmBNrNq6B /YrQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :user-agent:date:message-id:to:cc:subject:from :arc-authentication-results; bh=gneD2Y1dzStNSQDHa7dYoKo+Xc2A4OrsSmRGe5tGK7Q=; b=ZDo/7ktfCPzoQSvf7WUdbj73XAt9C91z5hgbQ/YhAdeXz4doYq+2NcUclUji0P+5Lu 4C23j5nWof2aMdetgdf1i+oZb04HO+lhl4F6S2EfQm7GE1piVrWUbR7FLVOp45js9YnI wKF/zFzothhF2l+aP2CaG5ZAKAnP7YJ5OQGzZ1jlCtWN6cxT+uHE+42blVIVLrhQobki vPo4UWQdYk59+4Iq/R0Abk6cIuS0/9svZEw1KiVxMs+G7awlQJampChSDL4T2f2jwCSp 0eIHqtyP7dlZR6Q35h+J5X2qJA9NZhQnxR5E1gU0Dht8m2jNaLhPULvv31IwO4aFLcMG jo3Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 78si9984975pgg.239.2018.01.20.01.07.45; Sat, 20 Jan 2018 01:08:32 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754810AbeATJB1 (ORCPT + 99 others); Sat, 20 Jan 2018 04:01:27 -0500 Received: from szxga04-in.huawei.com ([45.249.212.190]:4667 "EHLO huawei.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751063AbeATJBV (ORCPT ); Sat, 20 Jan 2018 04:01:21 -0500 Received: from DGGEMS413-HUB.china.huawei.com (unknown [172.30.72.58]) by Forcepoint Email with ESMTP id 4319C171CF646; Sat, 20 Jan 2018 17:01:07 +0800 (CST) Received: from [127.0.0.1] (10.177.31.14) by DGGEMS413-HUB.china.huawei.com (10.3.19.213) with Microsoft SMTP Server id 14.3.361.1; Sat, 20 Jan 2018 17:01:00 +0800 From: Hou Tao Subject: [RH72 Spectre] ibpb_enabled = 1 leads to hard LOCKUP under x86_64 host machine CC: , Thomas Gleixner , , , , , , To: , Message-ID: <12e55119-4f5b-da63-2b1c-14fb70243b21@huawei.com> Date: Sat, 20 Jan 2018 17:00:33 +0800 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.4.0 MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit X-Originating-IP: [10.177.31.14] X-CFilter-Loop: Reflected Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi all, We are testing the patches for Spectre and Meltdown under OS derived from RH7.2, and hit by a hard LOCKUP panic under a x86_64 host environment. The hard LOCKUP can be reproduced, and it will gone if we disable ibpb by writing 0 to ibpb_enabled file, and it will appear again when we enable ibpb ( writing 1 or 2). The workload running on the host is just starting two hundreds security containers sequentially, then stopping them and repeating. The security container is implemented by using docker and kvm, so there will be many "docker-containerd-shim" and "qemu-system-x86_64uvm" processes. The reproduction of the hard LOCKUP problem can be accelerated by running the following command ("hackbench" comes from ltp project): while true; do ./hackbench 100 process 1000; done We have saved vmcore files for the hard LOCKUPs by using kdump. The hard LOCKUPs are triggerd by different processes and on different Linux kernel stack. We have analyzed one hard LOCKUP, it is caused by wake_up_new_task() when it tried to get rq->lock by invoking __task_rq_lock(). The value of the lock is 422320416 (head = 6432, tail = 6444), and we have found the five processes which are waiting on the lock, but we can not find the process which had taken it. We guess maybe something is wrong with the CPU scheduler, because the RSP register of process runv which is waiting for rq->lock is incorrect. The RSP pointers the stack of swapper/57 and runv is also running on CPU 57 (more details in the end of the mail). The same phenomenon exists on others hardLOCKs. So has anyone encountered a similar problem before, and any suggestions and directions for the hard LOCKUP problems ? Thanks, Tao --- The following lines are output from one instance of the hard LOCKUP panics: * output from crash which complain about the unexpected RSP register: crash: inconsistent active task indications for CPU 57: runqueue: ffff882eac72e780 "runv" (default) current_task: ffff882f768e1700 "swapper/57" crash> runq -m -c 57 CPU 57: [0 00:00:00.000] PID: 8173 TASK: ffff882eac72e780 COMMAND: "runv" crash> bt 8173 PID: 8173 TASK: ffff882eac72e780 CPU: 57 COMMAND: "runv" #0 [ffff885fbe145e00] stop_this_cpu at ffffffff8101f66d #1 [ffff885fbe145e10] kbox_rlock_stop_other_cpus_call at ffffffffa031e649 #2 [ffff885fbe145e50] smp_nmi_call_function_handler at ffffffff81047dd6 #3 [ffff885fbe145e68] nmi_handle at ffffffff8164fc09 #4 [ffff885fbe145eb0] do_nmi at ffffffff8164fd84 #5 [ffff885fbe145ef0] end_repeat_nmi at ffffffff8164eff9 [exception RIP: _raw_spin_lock+48] RIP: ffffffff8164dc50 RSP: ffff882f768f3b18 RFLAGS: 00000002 RAX: 0000000000000a58 RBX: ffff882f76f1d080 RCX: 0000000000001920 RDX: 0000000000001922 RSI: 0000000000001922 RDI: ffff885fbe159580 RBP: ffff882f768f3b18 R8: 0000000000000012 R9: 0000000000000001 R10: 0000000000000400 R11: 0000000000000000 R12: ffff882f76f1d884 R13: 0000000000000046 R14: ffff885fbe159580 R15: 0000000000000039 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- --- #6 [ffff882f768f3b18] _raw_spin_lock at ffffffff8164dc50 bt: cannot transition from exception stack to current process stack: exception stack pointer: ffff885fbe145e00 process stack pointer: ffff882f768f3b18 current stack base: ffff882f34e38000 * kernel panic message when hard LOCKUP occurs [ 4396.807556] Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 55 [ 4396.807561] CPU: 55 PID: 8267 Comm: docker Tainted: G O ---- ------- 3.10.0-327.59.59.46.x86_64 #1 [ 4396.807563] Hardware name: Huawei RH2288H V3/BC11HGSA0, BIOS 1.57 08/11/2015 [ 4396.807564] Call Trace: [ 4396.807571] [] dump_stack+0x19/0x1b [ 4396.807575] [] panic+0xd8/0x214 [ 4396.807582] [] watchdog_overflow_callback+0xd1/0xe0 [ 4396.807589] [] __perf_event_overflow+0xa1/0x250 [ 4396.807595] [] perf_event_overflow+0x14/0x20 [ 4396.807600] [] intel_pmu_handle_irq+0x1e8/0x470 [ 4396.807610] [] ? ioremap_page_range+0x241/0x320 [ 4396.807617] [] ? ghes_copy_tofrom_phys+0x124/0x210 [ 4396.807621] [] ? ghes_read_estatus+0xa0/0x190 [ 4396.807626] [] perf_event_nmi_handler+0x2b/0x50 [ 4396.807629] [] nmi_handle.isra.0+0x69/0xb0 [ 4396.807633] [] do_nmi+0x134/0x410 [ 4396.807637] [] end_repeat_nmi+0x1e/0x7e [ 4396.807643] [] ? _raw_spin_lock+0x3a/0x50 [ 4396.807648] [] ? _raw_spin_lock+0x3a/0x50 [ 4396.807653] [] ? _raw_spin_lock+0x3a/0x50 [ 4396.807658] <> [] wake_up_new_task+0x9c/0x170 [ 4396.807662] [] do_fork+0x13b/0x320 [ 4396.807667] [] SyS_clone+0x16/0x20 [ 4396.807672] [] stub_clone+0x44/0x70 [ 4396.807676] [] ? system_call_fastpath+0x16/0x1b * cpu info for the first CPU (72 CPUs in total) processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 63 model name : Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz stepping : 2 microcode : 0x3b cpu MHz : 2300.000 cache size : 46080 KB physical id : 0 siblings : 36 core id : 0 cpu cores : 18 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 15 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 ds_cpl vmx smx est tm2 ssse3 fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm arat epb invpcid_single pln pts dtherm spec_ctrl ibpb_support tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm xsaveopt cqm_llc cqm_occup_llc bogomips : 4589.42 clflush size : 64 cache_alignment : 64 address sizes : 46 bits physical, 48 bits virtual power management: