Received: by 2002:a25:ab43:0:0:0:0:0 with SMTP id u61csp490374ybi; Thu, 30 May 2019 01:48:43 -0700 (PDT) X-Google-Smtp-Source: APXvYqxIVOVoYNPS21tU8Wbxoin98+MXwo+51rr1UdbdWsnqZM1Z6EInhY3LvV4jDONpvntmSrwY X-Received: by 2002:a65:62c4:: with SMTP id m4mr2822266pgv.308.1559206123021; Thu, 30 May 2019 01:48:43 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1559206123; cv=none; d=google.com; s=arc-20160816; b=ccg9AXKluDT/UtbABKjzGt15S8YMxACRAuhDJcIuJ7+wfIMQd/Nn/vRcLlsoFb4A1R hEaA/zMchvvS1hTdpqkR1iKGHSpj1RcDz2VTDAWZJrXqE4Phmdzw0/OFCNvErv669GFr YYVSo+War0JW1vSoddaHdLnyeEJALybLlXLpPqaijT8xoLPGcCY0oxcQIfwvx/bEPZYU 6qRo5KpjFBUNV2vYq4kimzn5aHnM5rhFp4iDTITUgvOIYIAcsPHMDnbCaQ4/OdNPNh7y 9uYpfwqtcN958AEJ9wgDteTERh8uahtWjjylsj1rIXr8JTLhC4pD/bwQpcU1htZBS63d yz0w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version; bh=7wqwILg9bPvbTOZyUBEavR8/uT+INOe/JPe0pE90Xiw=; b=gOtKmo+A7cVCSWaFKypWroMcW7n2ne2ffysFMtqpFijIQuKYdxDBpozyvVkggA2WF9 Mv7REg/k0BdmCEudP8mpkTjmd2TsJkqyJAFerftDqSbD02xxkR3nkXmXM9A1eTYJFqOx J+nudJldWAqjOUABv/JHRnuIzOn/aVCivX2OD5w/0QZ4rcJb7ilrzavWwXmTRJT5zc9b v/lG6/oc2nDHuP3zImta1l2NxJmQ0zVVfQOtD0j8XNY/KKqHQsmafzUosTehA6xikyMY JPwL0ZjTKSL0uPmmc1woeXhagQmavYfPxQ3YQ1J/F4SRY/KIAbYDGHbwhhchTfrk0Qni oplw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id h4si2125613pjs.103.2019.05.30.01.48.27; Thu, 30 May 2019 01:48:43 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726973AbfE3IrM (ORCPT + 99 others); Thu, 30 May 2019 04:47:12 -0400 Received: from mail-ot1-f67.google.com ([209.85.210.67]:41921 "EHLO mail-ot1-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726684AbfE3IrM (ORCPT ); Thu, 30 May 2019 04:47:12 -0400 Received: by mail-ot1-f67.google.com with SMTP id l25so4852849otp.8; Thu, 30 May 2019 01:47:11 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=7wqwILg9bPvbTOZyUBEavR8/uT+INOe/JPe0pE90Xiw=; b=U4BIxpwcXbC2fET4TSrIgPEh+OKBvrPHyaHYVcQ2f3U3qY/O1meLm6Dyr31oJdbHgl P3X9pFQ2n3WSbqBYMksr7uqhKWKMJTGgCMRtOvLlRW6EToDFuxpEky6lH7Vz16sAyPCz DwemZClnWzuNPbR8iBw100VO4pjng4XSzoT9QNMZXdnYSnTMZUZQm2P31OhMykkISm0g 3i0GtxtQYddWLSRNDmRjzPRBKlWwHEzvcOuq2RbZZ0oa3+3RN+gy24rEmJCviZ9LucT1 eJmk03YK8f/fmtKLcGMMi782iDJ/sA4LqnqbM6AqWuqD5cdSOGq9JZpDHe1Cw4YXhYaH B74g== X-Gm-Message-State: APjAAAVntLQ4+n+eQjWglFFBwZEQ5R9AP/G/+NbpUcHw6ZVMMiPi5Xwl JEA/TQ42Roq2LQ5cp60fMjpeNdpQc4sC6VykWgk= X-Received: by 2002:a9d:6e88:: with SMTP id a8mr1610597otr.118.1559206030915; Thu, 30 May 2019 01:47:10 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: "Rafael J. Wysocki" Date: Thu, 30 May 2019 10:46:59 +0200 Message-ID: Subject: Re: [PATCH v4] x86/power: Fix 'nosmt' vs. hibernation triple fault during resume To: Jiri Kosina , "the arch/x86 maintainers" Cc: "Rafael J. Wysocki" , Pavel Machek , Thomas Gleixner , Ingo Molnar , Borislav Petkov , "H. Peter Anvin" , Josh Poimboeuf , Peter Zijlstra , Linux PM , Linux Kernel Mailing List Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, May 30, 2019 at 12:09 AM Jiri Kosina wrote: > > From: Jiri Kosina > > As explained in > > 0cc3cd21657b ("cpu/hotplug: Boot HT siblings at least once") > > we always, no matter what, have to bring up x86 HT siblings during boot at > least once in order to avoid first MCE bringing the system to its knees. > > That means that whenever 'nosmt' is supplied on the kernel command-line, > all the HT siblings are as a result sitting in mwait or cpudile after > going through the online-offline cycle at least once. > > This causes a serious issue though when a kernel, which saw 'nosmt' on its > commandline, is going to perform resume from hibernation: if the resume > from the hibernated image is successful, cr3 is flipped in order to point > to the address space of the kernel that is being resumed, which in turn > means that all the HT siblings are all of a sudden mwaiting on address > which is no longer valid. > > That results in triple fault shortly after cr3 is switched, and machine > reboots. > > Fix this by always waking up all the SMT siblings before initiating the > 'restore from hibernation' process; this guarantees that all the HT > siblings will be properly carried over to the resumed kernel waiting in > resume_play_dead(), and acted upon accordingly afterwards, based on the > target kernel configuration. > Symmetricaly, the resumed kernel has to push the SMT siblings to mwait > again in case it has SMT disabled; this means it has to online all > the siblings when resuming (so that they come out of hlt) and offline > them again to let them reach mwait. > > Cc: stable@vger.kernel.org # v4.19+ > Debugged-by: Thomas Gleixner > Fixes: 0cc3cd21657b ("cpu/hotplug: Boot HT siblings at least once") > Signed-off-by: Jiri Kosina LGTM And I would prefer this one to go in through the PM tree due to the hibernate core changes, so can I get an ACK from the x86 arch side here, please? > --- > > v1 -> v2: > - restructure error handling as suggested by peterz > - add Rafael's ack > > v2 -> v3: > - added extra online/offline dance for nosmt case during > resume, as we want the siblings to be in mwait, not hlt > - dropped peterz's and Rafael's acks for now due to the above > > v3 -> v4: > - fix undefined return value from arch_resume_nosmt() in case > it's not overriden by arch > > arch/x86/power/cpu.c | 10 ++++++++++ > arch/x86/power/hibernate.c | 33 +++++++++++++++++++++++++++++++++ > include/linux/cpu.h | 4 ++++ > kernel/cpu.c | 4 ++-- > kernel/power/hibernate.c | 9 +++++++++ > 5 files changed, 58 insertions(+), 2 deletions(-) > > diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c > index a7d966964c6f..513ce09e9950 100644 > --- a/arch/x86/power/cpu.c > +++ b/arch/x86/power/cpu.c > @@ -299,7 +299,17 @@ int hibernate_resume_nonboot_cpu_disable(void) > * address in its instruction pointer may not be possible to resolve > * any more at that point (the page tables used by it previously may > * have been overwritten by hibernate image data). > + * > + * First, make sure that we wake up all the potentially disabled SMT > + * threads which have been initially brought up and then put into > + * mwait/cpuidle sleep. > + * Those will be put to proper (not interfering with hibernation > + * resume) sleep afterwards, and the resumed kernel will decide itself > + * what to do with them. > */ > + ret = cpuhp_smt_enable(); > + if (ret) > + return ret; > smp_ops.play_dead = resume_play_dead; > ret = disable_nonboot_cpus(); > smp_ops.play_dead = play_dead; > diff --git a/arch/x86/power/hibernate.c b/arch/x86/power/hibernate.c > index 4845b8c7be7f..fc413717a45f 100644 > --- a/arch/x86/power/hibernate.c > +++ b/arch/x86/power/hibernate.c > @@ -11,6 +11,7 @@ > #include > #include > #include > +#include > > #include > > @@ -245,3 +246,35 @@ int relocate_restore_code(void) > __flush_tlb_all(); > return 0; > } > + > +int arch_resume_nosmt(void) > +{ > + int ret = 0; > + /* > + * We reached this while coming out of hibernation. This means > + * that SMT siblings are sleeping in hlt, as mwait is not safe > + * against control transition during resume (see comment in > + * hibernate_resume_nonboot_cpu_disable()). > + * > + * If the resumed kernel has SMT disabled, we have to take all the > + * SMT siblings out of hlt, and offline them again so that they > + * end up in mwait proper. > + * > + * Called with hotplug disabled. > + */ > + cpu_hotplug_enable(); > + if (cpu_smt_control == CPU_SMT_DISABLED || > + cpu_smt_control == CPU_SMT_FORCE_DISABLED) { > + enum cpuhp_smt_control old = cpu_smt_control; > + > + ret = cpuhp_smt_enable(); > + if (ret) > + goto out; > + ret = cpuhp_smt_disable(old); > + if (ret) > + goto out; > + } > +out: > + cpu_hotplug_disable(); > + return ret; > +} > diff --git a/include/linux/cpu.h b/include/linux/cpu.h > index 3813fe45effd..fcb1386bb0d4 100644 > --- a/include/linux/cpu.h > +++ b/include/linux/cpu.h > @@ -201,10 +201,14 @@ enum cpuhp_smt_control { > extern enum cpuhp_smt_control cpu_smt_control; > extern void cpu_smt_disable(bool force); > extern void cpu_smt_check_topology(void); > +extern int cpuhp_smt_enable(void); > +extern int cpuhp_smt_disable(enum cpuhp_smt_control ctrlval); > #else > # define cpu_smt_control (CPU_SMT_NOT_IMPLEMENTED) > static inline void cpu_smt_disable(bool force) { } > static inline void cpu_smt_check_topology(void) { } > +static inline int cpuhp_smt_enable(void) { return 0; } > +static inline int cpuhp_smt_disable(enum cpuhp_smt_control ctrlval) { return 0; } > #endif > > /* > diff --git a/kernel/cpu.c b/kernel/cpu.c > index f2ef10460698..077fde6fb953 100644 > --- a/kernel/cpu.c > +++ b/kernel/cpu.c > @@ -2061,7 +2061,7 @@ static void cpuhp_online_cpu_device(unsigned int cpu) > kobject_uevent(&dev->kobj, KOBJ_ONLINE); > } > > -static int cpuhp_smt_disable(enum cpuhp_smt_control ctrlval) > +int cpuhp_smt_disable(enum cpuhp_smt_control ctrlval) > { > int cpu, ret = 0; > > @@ -2093,7 +2093,7 @@ static int cpuhp_smt_disable(enum cpuhp_smt_control ctrlval) > return ret; > } > > -static int cpuhp_smt_enable(void) > +int cpuhp_smt_enable(void) > { > int cpu, ret = 0; > > diff --git a/kernel/power/hibernate.c b/kernel/power/hibernate.c > index c8c272df7154..b65635753e8e 100644 > --- a/kernel/power/hibernate.c > +++ b/kernel/power/hibernate.c > @@ -257,6 +257,11 @@ void swsusp_show_speed(ktime_t start, ktime_t stop, > (kps % 1000) / 10); > } > > +__weak int arch_resume_nosmt(void) > +{ > + return 0; > +} > + > /** > * create_image - Create a hibernation image. > * @platform_mode: Whether or not to use the platform driver. > @@ -324,6 +329,10 @@ static int create_image(int platform_mode) > Enable_cpus: > suspend_enable_secondary_cpus(); > > + /* Allow architectures to do nosmt-specific post-resume dances */ > + if (!in_suspend) > + error = arch_resume_nosmt(); > + > Platform_finish: > platform_finish(platform_mode); > > > -- > Jiri Kosina > SUSE Labs >