Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp3868878rwd; Sat, 3 Jun 2023 13:55:17 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ562BDqEWDzv/HhEaPgb4HYtw34nfIxhf8JTeg/K6n2MRbvPqV8ln/a/ueBQka7mpA49PBh X-Received: by 2002:a05:6358:4303:b0:122:f227:581d with SMTP id r3-20020a056358430300b00122f227581dmr17852986rwc.24.1685825716838; Sat, 03 Jun 2023 13:55:16 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1685825716; cv=none; d=google.com; s=arc-20160816; b=T36EGyaABFlsqeuY9VlxU4EFO80D0FCAsPwdwVBOX04SSp9xa3RV5pyFnfkYA4uY4e Qk9+GAZ2jHE1QiL6XgNFqs8kt8yTROQ0XCCNb1z7LgJIBL5qiYug6gMzcTijFXy7Uwuy xw8lcO5pmB9USwb4zNhDMruFlkgCbgjUXJ3qyF/t/8aPSSdZdVNymXQvoxEZWXztSCjX hDEeQAMh+vcG7brZv3nqXsYHwE7SSzYRJ/v2ZsymjgrcTnG2vtUAexCXEAwPtn3PtU38 fr79iFnYyIMAdcDNolf7X/sZ8F/AxXcuPf4X+zc2MNzokZdmazDXWauZv9Es555VC149 npKA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:date:subject:cc:to:from:dkim-signature :dkim-signature:message-id; bh=BQ88LTbW2bjMSOWAbmjSTzdR+6tslMVtAHZnogA/mqo=; b=fi5WZ8D1azcU3AdJfIPZXDvpJvenIAsxDLjCaNOc1Ox1PhZgHe15Pv1CoJ1V/n5eeN BGtLkQ3C5xdZ8FhEln1odDsXofkXqh/V/0z9QSocODiBxB6IR6Hd1uXnr5x7Y/qF8Tui I3XZ6Fq+yJzzw0Bvsi/i8gJnipI/jCKbZE4oVDjxyC1yRfc4mOz34kPsnaeOCEVvrKDP 7/ErkpjvCj20URz8KwQQNinA6XeYhyYr0Zqjv8bRj4kHQa9qtkwzWViuCyUW2sIzhezW 7x/jG96b+BwALPV8WA6BRQmmYi0zKP4pDkwvjW3kraUeVIMbFrQrwIQZqs+wZATRG7Sr QVbA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=0qD3zraS; dkim=neutral (no key) header.i=@linutronix.de header.b=lxD4ozVB; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id o11-20020a63730b000000b0054294720d56si3210355pgc.387.2023.06.03.13.55.05; Sat, 03 Jun 2023 13:55:16 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=0qD3zraS; dkim=neutral (no key) header.i=@linutronix.de header.b=lxD4ozVB; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230400AbjFCUIA (ORCPT + 99 others); Sat, 3 Jun 2023 16:08:00 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51338 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230306AbjFCUHw (ORCPT ); Sat, 3 Jun 2023 16:07:52 -0400 Received: from galois.linutronix.de (Galois.linutronix.de [IPv6:2a0a:51c0:0:12e:550::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BFC82E74 for ; Sat, 3 Jun 2023 13:07:24 -0700 (PDT) Message-ID: <20230603193439.502645149@linutronix.de> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1685822815; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc; bh=BQ88LTbW2bjMSOWAbmjSTzdR+6tslMVtAHZnogA/mqo=; b=0qD3zraSNnjHvv7+3PzzC8YBmPgbkc4VAPSWDD3ZIwZ1lTvtZfSogX4/icdtvTkiEuEXoy PTTcoG9sKMjxc8yCK90oq+DNWV+xWtPhLSOHkC9Ej3nC+ygWeCvNnCZ31mqB5I20ejOcN0 u8l+t04Qubv2Ay1kJlP7hqBtEb8ErKxeA3k+H5ppphPU5vgyZ0kCvRSJ/O5dVsILif+e0T TekJrZqurz7/4iRyvOOmBctOY7MXnVqgJqxD2OmbqNSudj/+08LHZc5eu4MiLp4EGF+3lp mv+OmFK6/OMOCnd2iwF1l0qer/X84HdkZrtc5GKiGzeCKYHpBPLleLF0DePyhw== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1685822815; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc; bh=BQ88LTbW2bjMSOWAbmjSTzdR+6tslMVtAHZnogA/mqo=; b=lxD4ozVB09wBxSJ/zrGpYC+W/o/nQREh0+B4fpTD7AckMHnx1Nh5uRI91/5WIYecKvTOxC rbH+3YqTxvW/22DQ== From: Thomas Gleixner To: LKML Cc: x86@kernel.org, Ashok Raj , Dave Hansen , Tony Luck , Arjan van de Veen , Peter Zijlstra , Eric Biederman Subject: [patch 0/6] Cure kexec() vs. mwait_play_dead() troubles Date: Sat, 3 Jun 2023 22:06:54 +0200 (CEST) X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi! Ashok observed triple faults when executing kexec() on a kernel which has 'nosmt' on the kernel commandline and HT enabled in the BIOS. 'nosmt' brings up the HT siblings to the point where they initiliazed the CPU and then rolls the bringup back which parks them in mwait_play_dead(). The reason is that all CPUs should have CR4.MCE set. Otherwise a broadcast MCE will immediately shut down the machine. Some detective work revealed that: 1) The kexec kernel can overwrite text, pagetables, stack and data of the previous kernel. 2) If the kexec kernel writes to the memory which is monitored by an "offline" CPU, that CPU resumes execution. That's obviously doomed when the kexec kernel overwrote text, pagetables, data or stack. While on my test machine the first kexec() after reset always "worked", the second one reliably ended up in a triple fault. The following series cures this by: 1) Bringing offline CPUs which are stuck in mwait_play_dead() out of mwait by writing to the monitored cacheline 2) Let the woken up CPUs check the written control word and drop into a HLT loop if the control word requests so. This is only half safe because HLT can resume execution due to NMI, SMI and MCE. Unfortunately there is no real safe mechanism to "park" a CPU reliably, but there is at least one which prevents the NMI and SMI cause: INIT. 3) If the system uses the regular INIT/STARTUP sequence to wake up secondary CPUS, then "park" all CPUs including the "offline" ones by sending them INIT IPIs. The INIT IPI brings the CPU into a wait for wakeup state which is not affected by NMI and SMI, but INIT also clears CR4.MCE, so the broadcast MCE problem comes back. But that's not really any different from a CPU sitting in the HLT loop on the previous kernel. If a broadcast MCE arrives, HLT resumes execution and the CPU tries to handle the MCE on overwritten text, pagetables etc. So parking them via INIT is not completely solving the problem, but it takes at least NMI and SMI out of the picture. The series is also available from git: git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git x86/kexec Thanks, tglx --- include/asm/smp.h | 4 + kernel/smp.c | 62 +++++++++++++--------- kernel/smpboot.c | 151 ++++++++++++++++++++++++++++++++++++++++-------------- 3 files changed, 156 insertions(+), 61 deletions(-)