Received: by 2002:a25:ab43:0:0:0:0:0 with SMTP id u61csp6892942ybi; Wed, 29 May 2019 15:12:02 -0700 (PDT) X-Google-Smtp-Source: APXvYqzjLo5FRCMANsS0MlOio+sgdtaQK3fiqfrhLO1Z6ZGZe5nyTRcSzTdP+8rGr0fN11iKFYBt X-Received: by 2002:a17:902:1126:: with SMTP id d35mr242915pla.82.1559167922113; Wed, 29 May 2019 15:12:02 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1559167922; cv=none; d=google.com; s=arc-20160816; b=dl0zh252BiRVf2wFUASDmRp2ICbdoujtD1z5ni5y/naByjwJQm6I+gPdbRZoTpWXz4 SBsKlzFyNqGgl/SSC545ikP0SGiyNvrWBPKdAJHsJqe4lScE8rcaTsNu0Ia7oMxxP6QH s9LSfwat/yVwHtU3nd0IIh/B4OFFIrIIk9eQlS779tKCyi6zzRBCSCEXL9tcFhkUkRl8 Fj57ALFbcnqCSY/gaw65DkigsRoccnHCA9S07EWYRuIgk4T0B3MFbJlyctkfkA92bcei X8EV4DV26CigAUavX2FJRYDm5wSV4FgK5QV9o30reAjv7k/ACbe97dJQQRy8dOTTP5Ng Dldw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :message-id:in-reply-to:subject:cc:to:from:date:dkim-signature; bh=dJhc4vSuSkSfaKTBwI7eqRjDXiflj2sX09ISgP8FHMg=; b=T1q+tRLFveK4wklzRFXZG6uREO7bBsbNJGJARQZeQSCKR6OObyeweCJfxsgPDU8f+J uQJqgTuf5HXZD3yfXTwC8hCYSR31V5WrnYL15mRqGNBy/X119QWlzqcctxMb28s0+iW6 SrnrPkQlYIMuHK5mm4UR0PhUlv9kKBBNgyuiPLO66mFtAEfIeI4tfBv+2uZPmVa9cpJx pkrRAS+vdxVeXW4o4UBUiugGN2+zhWPKZCCEQEgdG5Ci/eNAEk57BQHO4HIUFj8GcBcy +mqw2cpWRyPxWWD0E2q5ALTJjc3aL8i9YiDio9rA6XYVimrgCs6LKabfsnnfWrAYuiVS ywNA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=KOURBKUu; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id t35si1043617pgk.461.2019.05.29.15.11.39; Wed, 29 May 2019 15:12:02 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=KOURBKUu; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726546AbfE2WJp (ORCPT + 99 others); Wed, 29 May 2019 18:09:45 -0400 Received: from mail.kernel.org ([198.145.29.99]:49784 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726054AbfE2WJp (ORCPT ); Wed, 29 May 2019 18:09:45 -0400 Received: from pobox.suse.cz (prg-ext-pat.suse.com [213.151.95.130]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 868B52075B; Wed, 29 May 2019 22:09:41 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1559167783; bh=3OmN7Ed90J/s26tvuNKReiDT2xMzEGrBWFHugyhQnlw=; h=Date:From:To:cc:Subject:In-Reply-To:References:From; b=KOURBKUuflSIr5s09riZllLv7PKMiQLqDJ0Oo9erxssyCTq3o5sECTB1wgLBYwKlZ +ogdHnO2mk+N9E7MlnbOcV6E2Lnc7Wx4f1I/oiVi6VnRgIqsxE9YF8q26C2VVlVFD+ 9uizhhYHeemF5ttAD5zLnh0u/i7MsnAnqyd8Wp6Y= Date: Thu, 30 May 2019 00:09:39 +0200 (CEST) From: Jiri Kosina To: "Rafael J. Wysocki" , Pavel Machek , Thomas Gleixner , Ingo Molnar , Borislav Petkov , "H. Peter Anvin" , Josh Poimboeuf , Peter Zijlstra cc: x86@kernel.org, linux-pm@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v4] x86/power: Fix 'nosmt' vs. hibernation triple fault during resume In-Reply-To: Message-ID: References: User-Agent: Alpine 2.21 (LSU 202 2017-01-01) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Jiri Kosina As explained in 0cc3cd21657b ("cpu/hotplug: Boot HT siblings at least once") we always, no matter what, have to bring up x86 HT siblings during boot at least once in order to avoid first MCE bringing the system to its knees. That means that whenever 'nosmt' is supplied on the kernel command-line, all the HT siblings are as a result sitting in mwait or cpudile after going through the online-offline cycle at least once. This causes a serious issue though when a kernel, which saw 'nosmt' on its commandline, is going to perform resume from hibernation: if the resume from the hibernated image is successful, cr3 is flipped in order to point to the address space of the kernel that is being resumed, which in turn means that all the HT siblings are all of a sudden mwaiting on address which is no longer valid. That results in triple fault shortly after cr3 is switched, and machine reboots. Fix this by always waking up all the SMT siblings before initiating the 'restore from hibernation' process; this guarantees that all the HT siblings will be properly carried over to the resumed kernel waiting in resume_play_dead(), and acted upon accordingly afterwards, based on the target kernel configuration. Symmetricaly, the resumed kernel has to push the SMT siblings to mwait again in case it has SMT disabled; this means it has to online all the siblings when resuming (so that they come out of hlt) and offline them again to let them reach mwait. Cc: stable@vger.kernel.org # v4.19+ Debugged-by: Thomas Gleixner Fixes: 0cc3cd21657b ("cpu/hotplug: Boot HT siblings at least once") Signed-off-by: Jiri Kosina --- v1 -> v2: - restructure error handling as suggested by peterz - add Rafael's ack v2 -> v3: - added extra online/offline dance for nosmt case during resume, as we want the siblings to be in mwait, not hlt - dropped peterz's and Rafael's acks for now due to the above v3 -> v4: - fix undefined return value from arch_resume_nosmt() in case it's not overriden by arch arch/x86/power/cpu.c | 10 ++++++++++ arch/x86/power/hibernate.c | 33 +++++++++++++++++++++++++++++++++ include/linux/cpu.h | 4 ++++ kernel/cpu.c | 4 ++-- kernel/power/hibernate.c | 9 +++++++++ 5 files changed, 58 insertions(+), 2 deletions(-) diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c index a7d966964c6f..513ce09e9950 100644 --- a/arch/x86/power/cpu.c +++ b/arch/x86/power/cpu.c @@ -299,7 +299,17 @@ int hibernate_resume_nonboot_cpu_disable(void) * address in its instruction pointer may not be possible to resolve * any more at that point (the page tables used by it previously may * have been overwritten by hibernate image data). + * + * First, make sure that we wake up all the potentially disabled SMT + * threads which have been initially brought up and then put into + * mwait/cpuidle sleep. + * Those will be put to proper (not interfering with hibernation + * resume) sleep afterwards, and the resumed kernel will decide itself + * what to do with them. */ + ret = cpuhp_smt_enable(); + if (ret) + return ret; smp_ops.play_dead = resume_play_dead; ret = disable_nonboot_cpus(); smp_ops.play_dead = play_dead; diff --git a/arch/x86/power/hibernate.c b/arch/x86/power/hibernate.c index 4845b8c7be7f..fc413717a45f 100644 --- a/arch/x86/power/hibernate.c +++ b/arch/x86/power/hibernate.c @@ -11,6 +11,7 @@ #include #include #include +#include #include @@ -245,3 +246,35 @@ int relocate_restore_code(void) __flush_tlb_all(); return 0; } + +int arch_resume_nosmt(void) +{ + int ret = 0; + /* + * We reached this while coming out of hibernation. This means + * that SMT siblings are sleeping in hlt, as mwait is not safe + * against control transition during resume (see comment in + * hibernate_resume_nonboot_cpu_disable()). + * + * If the resumed kernel has SMT disabled, we have to take all the + * SMT siblings out of hlt, and offline them again so that they + * end up in mwait proper. + * + * Called with hotplug disabled. + */ + cpu_hotplug_enable(); + if (cpu_smt_control == CPU_SMT_DISABLED || + cpu_smt_control == CPU_SMT_FORCE_DISABLED) { + enum cpuhp_smt_control old = cpu_smt_control; + + ret = cpuhp_smt_enable(); + if (ret) + goto out; + ret = cpuhp_smt_disable(old); + if (ret) + goto out; + } +out: + cpu_hotplug_disable(); + return ret; +} diff --git a/include/linux/cpu.h b/include/linux/cpu.h index 3813fe45effd..fcb1386bb0d4 100644 --- a/include/linux/cpu.h +++ b/include/linux/cpu.h @@ -201,10 +201,14 @@ enum cpuhp_smt_control { extern enum cpuhp_smt_control cpu_smt_control; extern void cpu_smt_disable(bool force); extern void cpu_smt_check_topology(void); +extern int cpuhp_smt_enable(void); +extern int cpuhp_smt_disable(enum cpuhp_smt_control ctrlval); #else # define cpu_smt_control (CPU_SMT_NOT_IMPLEMENTED) static inline void cpu_smt_disable(bool force) { } static inline void cpu_smt_check_topology(void) { } +static inline int cpuhp_smt_enable(void) { return 0; } +static inline int cpuhp_smt_disable(enum cpuhp_smt_control ctrlval) { return 0; } #endif /* diff --git a/kernel/cpu.c b/kernel/cpu.c index f2ef10460698..077fde6fb953 100644 --- a/kernel/cpu.c +++ b/kernel/cpu.c @@ -2061,7 +2061,7 @@ static void cpuhp_online_cpu_device(unsigned int cpu) kobject_uevent(&dev->kobj, KOBJ_ONLINE); } -static int cpuhp_smt_disable(enum cpuhp_smt_control ctrlval) +int cpuhp_smt_disable(enum cpuhp_smt_control ctrlval) { int cpu, ret = 0; @@ -2093,7 +2093,7 @@ static int cpuhp_smt_disable(enum cpuhp_smt_control ctrlval) return ret; } -static int cpuhp_smt_enable(void) +int cpuhp_smt_enable(void) { int cpu, ret = 0; diff --git a/kernel/power/hibernate.c b/kernel/power/hibernate.c index c8c272df7154..b65635753e8e 100644 --- a/kernel/power/hibernate.c +++ b/kernel/power/hibernate.c @@ -257,6 +257,11 @@ void swsusp_show_speed(ktime_t start, ktime_t stop, (kps % 1000) / 10); } +__weak int arch_resume_nosmt(void) +{ + return 0; +} + /** * create_image - Create a hibernation image. * @platform_mode: Whether or not to use the platform driver. @@ -324,6 +329,10 @@ static int create_image(int platform_mode) Enable_cpus: suspend_enable_secondary_cpus(); + /* Allow architectures to do nosmt-specific post-resume dances */ + if (!in_suspend) + error = arch_resume_nosmt(); + Platform_finish: platform_finish(platform_mode); -- Jiri Kosina SUSE Labs