Received: by 2002:a25:ab43:0:0:0:0:0 with SMTP id u61csp857711ybi; Fri, 31 May 2019 09:52:46 -0700 (PDT) X-Google-Smtp-Source: APXvYqwJZFvzL5+S/kWM9kWPcJhzgtaMqTfrTlm9ZNEqmx47A0V+GMNIaxv1SCBWvjbHHDsrJbQh X-Received: by 2002:aa7:9357:: with SMTP id 23mr1375654pfn.60.1559321566805; Fri, 31 May 2019 09:52:46 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1559321566; cv=none; d=google.com; s=arc-20160816; b=e/J7/B0Rh5Fn2MfZkFiqPBsaTfARFf0pUHYIDRACTysIPx/KIEEZ8z+Zm7ryD2CSNd 4LZEdLl3pNh2J90dXHmsZ2wjJUGK8cWPWciAoan2AgQtqgn4HqNhGGzj/fRv+pbfHP8v coCIickETnPI1Cj2+Q7iD6HB1YGCn42MiTyYjjdnvCcQ99ssYKa5DII9IQ6Jm7I+tl1I foESgmULHHebvC38VwXYU07q0C8N1lhAD3FF7YE8HUXJZ2ZZLaGTQNZH9srFjEBnYt3O K0DHHDGlq1YDqdNOt7aZZXXWI0lNS0HX7T9xT0PAiENo1CO+tLl8pMelUarHMoIiVBnf UXuQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=rJ2wyAYp4XKkzQLEXFDOsDHlK3QBPYUFwGQ6ZliFJmE=; b=jCwCLC8jZBn9ip6gNhf1khKRdGvUz7ebpGeuLOaSrpap/877dYNGiAvkNvimECbjU+ c3QYPCGff8MeIQlVkazCEt4DgDW7B244uiWxP6+WV5WMYi0HDaM+xB9+W7rK7RthSRl7 V3atxKZ3GXKeWiCZx68w5gfDJvWeDQF4ZtIXkTFyMTudKAGDqNXjS/qxV1OmQG3htvl6 D0G4cIwmMOPCRC6JN/g6p11ypo7inE63a+ZhL7xGVwVvw5TadpoPE77xc92jvHMPqPh1 3l8Sg+xLn6D6zVoWkrYQW9vTvGf4s7wIKlSAW3W3NaCcSfPP4vtt7T05IYX8vBaJQShr N+kA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=I7BdUEm+; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id k10si6717267plt.133.2019.05.31.09.52.29; Fri, 31 May 2019 09:52:46 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=I7BdUEm+; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726808AbfEaQvY (ORCPT + 99 others); Fri, 31 May 2019 12:51:24 -0400 Received: from mail.kernel.org ([198.145.29.99]:58112 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726518AbfEaQvX (ORCPT ); Fri, 31 May 2019 12:51:23 -0400 Received: from mail-wm1-f51.google.com (mail-wm1-f51.google.com [209.85.128.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 0A8B426C81 for ; Fri, 31 May 2019 16:51:22 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1559321482; bh=BT2rWzXi62oKFaPqb3HSutdmVFL9XTsT6SBXBhkv8gc=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=I7BdUEm+hjcpnDjsG+GYN2Nyfi5D2eKdobP3NLAABX8nTHcVTqbROJ7qTy2JlcW0b a/4FU28Bqnc4m+Hwh5KsHASekzIpOXCMVvjAEktbDhdu+JZ9+BqiLn83ZvfdN7CHGt qW6RoJ4XwgUebKMsJRS2taUtvR2t/2ivlFG+EviU= Received: by mail-wm1-f51.google.com with SMTP id v22so6357068wml.1 for ; Fri, 31 May 2019 09:51:21 -0700 (PDT) X-Gm-Message-State: APjAAAXZas6zpVktEZGLldM1ncBIgVwMBNtQ/xSyv0pzdaAe0w0+9/Z+ EPSRxmp4kHaxG8QltK9Se4p808YRSbqhnI9v3xA0hQ== X-Received: by 2002:a1c:6242:: with SMTP id w63mr1856759wmb.161.1559321480595; Fri, 31 May 2019 09:51:20 -0700 (PDT) MIME-Version: 1.0 References: <20190531051456.fzkvn62qlkf6wqra@treble> <5564116.e9OFvgDRbB@kreacher> <20190531152626.4nmyc7lj6mjwuo2v@treble> <20190531161952.dps3grwg4ytrpuqw@treble> In-Reply-To: <20190531161952.dps3grwg4ytrpuqw@treble> From: Andy Lutomirski Date: Fri, 31 May 2019 09:51:09 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH v4] x86/power: Fix 'nosmt' vs. hibernation triple fault during resume To: Josh Poimboeuf Cc: Jiri Kosina , Andy Lutomirski , "Rafael J. Wysocki" , "Rafael J. Wysocki" , Thomas Gleixner , "the arch/x86 maintainers" , Pavel Machek , Ingo Molnar , Borislav Petkov , "H. Peter Anvin" , Peter Zijlstra , Linux PM , Linux Kernel Mailing List Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, May 31, 2019 at 9:19 AM Josh Poimboeuf wrote: > > On Fri, May 31, 2019 at 05:41:18PM +0200, Jiri Kosina wrote: > > On Fri, 31 May 2019, Josh Poimboeuf wrote: > > > > > The only question I'd have is if we have data on the power savings > > > difference between hlt and mwait. mwait seems to wake up on a lot of > > > different conditions which might negate its deeper sleep state. > > > > hlt wakes up on basically the same set of events, but has the > > auto-restarting semantics on some of them (especially SMM). So the wakeup > > frequency itself shouldn't really contribute to power consumption > > difference; it's the C-state that mwait allows CPU to enter. > > Ok. I reluctantly surrender :-) For your v4: > > Reviewed-by: Josh Poimboeuf > > It works as a short term fix, but it's fragile, and it does feel like > we're just adding more duct tape, as Andy said. > Just to clarify what I was thinking, it seems like soft-offlining a CPU and resuming a kernel have fundamentally different requirements. To soft-offline a CPU, we want to get power consumption as low as possible and make sure that MCE won't kill the system. It's okay for the CPU to occasionally execute some code. For resume, what we're really doing is trying to hand control of all CPUs from kernel A to kernel B. There are two basic ways to hand off control of a given CPU: we can jump (with JMP, RET, horrible self-modifying code, etc) from one kernel to the other, or we can attempt to make a given CPU stop executing code from either kernel at all and then forcibly wrench control of it in kernel B. Either approach seems okay, but the latter approach depends on getting the CPU to reliably stop executing code. We don't care about power consumption for resume, and I'm not even convinced that we need to be able to survive an MCE that happens while we're resuming, although surviving MCE would be nice. So if we don't want to depend on nasty system details at all, we could have the first kernel explicitly wake up all CPUs and hand them all off to the new kernel, more or less the same way that we hand over control of the BSP right now. Or we can look for a way to tell all the APs to stop executing kernel code, and the only architectural way I know of to do that is to sent an INIT IPI (and then presumably deassert INIT -- the SDM is a bit vague). Or we could allocate a page, stick a GDT, a TSS, and a 1: hlt; jmp 1b in it, turn off paging, and run that code. And then somehow convince the kernel we load not to touch that page until it finishes waking up all CPUs. This seems conceptually simple and very robust, but I'm not sure it fits in with the way hibernation works right now at all.