Received: by 2002:a25:ab43:0:0:0:0:0 with SMTP id u61csp684740ybi; Fri, 31 May 2019 07:28:16 -0700 (PDT) X-Google-Smtp-Source: APXvYqy6IdBDqBvdl/ZoSutW/QQgu+lxGn5VEoHSkSo9y/uUpNsRaL431VVFHvA9I9f8Zf6fBHFu X-Received: by 2002:a63:6841:: with SMTP id d62mr9423231pgc.17.1559312896671; Fri, 31 May 2019 07:28:16 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1559312896; cv=none; d=google.com; s=arc-20160816; b=ap61w1afZJTjqwgvA8t/IGNoqdZartdCIEINBNihAzUP92ml5AfXNqp1iUUcWsLKVP szwayNBeZtdrpg8+NRAK9UDDSN6c8qaCvZzcUtCo4S8e8dPYj/5OUsm6XW6CSUjG6IXa RNgMuEw1u8nMHQwH/mcmbFjn9TkcX5rcGG/wHFn3dvvKjQIQoT4mROFd71pVIpHolZne H+npd3B3LuWLzZ5iziLGy77CaENtuVrTiYsAcyXrl8nJFz9T1MoTHley3Z9+KUcMK9ld WRcubzln9Zf6a4kC4W+bpKxP9xIk6oY4ckbhlZtT99BmM8rstEB+WKGeMjTUifqS5RcD 6VNA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=JlP7T7U+VjxDLTCxYgJS3zp7IPHrXsyvKfUlCN/ecy4=; b=SkcpqsijKoIbEQyqAZXNEJsPphhQc/6508hvirC7EDFDRuR+L/yqp+u/B1k1usvXkM /Vc2NyxY1D2nyoKtuS64DAX8x92sPHyKZ+u9dg4Yf56S9K1aZp6gOiHGK4r/X8gJSrHq CAnQXf4y3sBF3DzA36AyRAgWlR4CiPdUKAaBIE9uVr7y0vHIi/t5e1wwZXhwVWYlCdpq szazsaLiYNd1VYQ9lZHxFfQuv0HoK93kkDLt4meZtagdRsUbgZEoMTYZSnSzuszKMRCR jQGR5LVZTdOAw8V0QsZpjZMAYS2ribVH3n9ZG8ksNql+uxni0p5XpIWISVMbfnuXng97 yLqg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b="Vk5d/bOT"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id k12si6436668pgj.593.2019.05.31.07.27.59; Fri, 31 May 2019 07:28:16 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b="Vk5d/bOT"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726719AbfEaOZL (ORCPT + 99 others); Fri, 31 May 2019 10:25:11 -0400 Received: from mail.kernel.org ([198.145.29.99]:46218 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726682AbfEaOZL (ORCPT ); Fri, 31 May 2019 10:25:11 -0400 Received: from mail-wr1-f41.google.com (mail-wr1-f41.google.com [209.85.221.41]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 4E41F26A95 for ; Fri, 31 May 2019 14:25:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1559312709; bh=xuMjoe5EkMqnrBnt5MUNgIfNL6mvB94WhsCDAFAE2Qk=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=Vk5d/bOTWDIPFtmxhazofl2WKQckD1fwGoMDgkgb80Wde1p7rC9R61yxj9vBs76J7 2RAa22GazhJvDyceEyZa0CQGamgmphP1BeSM5A5BlYNCUywni5lYUMJ0FXVZU2fq+R dx3vO40hdrDv/JN7QD5Iej6oz4aMg8rVZLnme7EE= Received: by mail-wr1-f41.google.com with SMTP id d9so6699482wrx.0 for ; Fri, 31 May 2019 07:25:09 -0700 (PDT) X-Gm-Message-State: APjAAAVtCNL5DFN411HWUiim1RKZV3TYYccLA7gW8uQdMZF9ZWBMAXLc 3IErq82fKxgd8XdM27gnZPW5lguh6DXhsIzc1elQaQ== X-Received: by 2002:a5d:6207:: with SMTP id y7mr6479585wru.265.1559312707738; Fri, 31 May 2019 07:25:07 -0700 (PDT) MIME-Version: 1.0 References: <20190531051456.fzkvn62qlkf6wqra@treble> <5564116.e9OFvgDRbB@kreacher> In-Reply-To: <5564116.e9OFvgDRbB@kreacher> From: Andy Lutomirski Date: Fri, 31 May 2019 07:24:56 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH v4] x86/power: Fix 'nosmt' vs. hibernation triple fault during resume To: "Rafael J. Wysocki" Cc: Jiri Kosina , Josh Poimboeuf , "Rafael J. Wysocki" , Thomas Gleixner , "the arch/x86 maintainers" , Pavel Machek , Ingo Molnar , Borislav Petkov , "H. Peter Anvin" , Peter Zijlstra , Linux PM , Linux Kernel Mailing List , Andy Lutomirski Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, May 31, 2019 at 1:57 AM Rafael J. Wysocki wrote: > > On Friday, May 31, 2019 10:47:21 AM CEST Jiri Kosina wrote: > > On Fri, 31 May 2019, Josh Poimboeuf wrote: > > > > > > I disagree with that from the backwards compatibility point of view. > > > > > > > > I personally am quite frequently using differnet combinations of > > > > resumer/resumee kernels, and I've never been biten by it so far. I'd guess > > > > I am not the only one. > > > > Fixmap sort of breaks that invariant. > > > > > > Right now there is no backwards compatibility because nosmt resume is > > > already broken. > > > > Yeah, well, but that's "only" for nosmt kernels at least. > > > > > For "future" backwards compatibility we could just define a hard-coded > > > reserved fixmap page address, adjacent to the vsyscall reserved address. > > > > > > Something like this (not yet tested)? Maybe we could also remove the > > > resume_play_dead() hack? > > > > Does it also solve cpuidle case? I have no overview what all the cpuidle > > drivers might be potentially doing in their ->enter_dead() callbacks. > > Rafael? > > There are just two of them, ACPI cpuidle and intel_idle, and they both should > be covered. > > In any case, I think that this is the way to go here even though it may be somewhat > problematic to start with. > Given that there seems to be a genuine compatibility issue right now, can we design an actual sane way to hand off control of all CPUs rather than adding duct tape to an extremely fragile mechanism? I can think of at least two sensible solutions: 1. Have a self-contained "play dead for kexec/resume" function that touches only few well-defined physical pages: a set of page tables and a page of code. Load CR3 to point to those page tables, fill in the code with some form of infinite loop, and run it. Or just turn off paging entirely and run the infinite loop. Have the kernel doing the resuming inform the kernel being resumed of which pages these are, and have the kernel being resumed take over all CPUs before reusing the pages. 2. Put the CPU all the way to sleep by sending it an INIT IPI. Version 2 seems very simple and robust. Is there a reason we can't do it? We obviously don't want to do it for normal offline because it might be a high-power state, but a cpu in the wait-for-SIPI state is not going to exit that state all by itself. The patch to implement #2 should be short and sweet as long as we are careful to only put genuine APs to sleep like this. The only downside I can see is that an new kernel resuming and old kernel that was booted with nosmt is going to waste power, but I don't think that's a showstopper.