Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp6294864rwd; Mon, 5 Jun 2023 16:27:11 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ5sKzjHXiHg3vU8sQitVrGbrJZ5lifyWBdTg1/4r095wDBNScR3TcKLYm/urjgBsCzadBbP X-Received: by 2002:a05:620a:d8d:b0:75d:4ee7:f489 with SMTP id q13-20020a05620a0d8d00b0075d4ee7f489mr285893qkl.23.1686007631033; Mon, 05 Jun 2023 16:27:11 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1686007631; cv=none; d=google.com; s=arc-20160816; b=gX7Zazc7Mh++8m7dUPU3oz9+n3+6sNikLfJ4FxzyhSzhNBodn/ELSzLOiUi1RjNRlP XYgySph4z6BX4GbOG7ERd6Bm8yh6f24D7QJJ3W4pnBtLjG8mivCnQcdghl7U7VwHsjW3 7SbumG8IUc/rneoxFGmmUisXC6fbzXRA9HmIBg1+21x+GB+jsUnb80SbV+FxtlwOgSq7 hd/OpUc5kCvHfBx4BSLaDI0qfyLnyOwsAy0cfqO687XeDkAUt75Hw0xwXDCQyFgoJnmD ugRoEPNEboEuMKq0WQno0vqBGsj8l6GEDMdAotmbJu8LgN6utjnq/Ig+j6mBEJUWmZq+ CtKw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=hP1a+zSS6By6YHFWWESMz0z0ScU/pBzOH1GhvDFLrYA=; b=Pwmf9+oVstQqZqD/oex1H9WfQgad3x2T/c5blssDcIYSD5U7/+K+SvYvMn84eox6Og QnEujLjtsGsTX2JYQz0QArFBIK6trjhkLWKBBg3FeFVCkVGx1f3mri7frhhVH72MVYro /ypbBMSIVnIYEaZdOvvBfm5CTxFUdiYvvShPvy67E8VV3XcNByoWhoEptlUbJ3NkYZlo An52lIHvcV5Rf2cbD2XD86oMvIszIpKGuanlQxzJ3y4GjLgKtFdC1t0NNe6hPSs2+KQm UJjVxbZsbAksdcwZuZe2TXkHnDtKndVtn26YtfO3POgTkMGk0W45qlYC+OgotMSXldzx Z2fg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=ZgkCFlAJ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id o15-20020ae9f50f000000b007468a0f1abasi5155153qkg.600.2023.06.05.16.26.55; Mon, 05 Jun 2023 16:27:11 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=ZgkCFlAJ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232159AbjFEXJD (ORCPT + 99 others); Mon, 5 Jun 2023 19:09:03 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48264 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231562AbjFEXJB (ORCPT ); Mon, 5 Jun 2023 19:09:01 -0400 Received: from mail-pl1-x64a.google.com (mail-pl1-x64a.google.com [IPv6:2607:f8b0:4864:20::64a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E2B1AD2 for ; Mon, 5 Jun 2023 16:09:00 -0700 (PDT) Received: by mail-pl1-x64a.google.com with SMTP id d9443c01a7336-1b01d7b3e80so49158475ad.3 for ; Mon, 05 Jun 2023 16:09:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1686006540; x=1688598540; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=hP1a+zSS6By6YHFWWESMz0z0ScU/pBzOH1GhvDFLrYA=; b=ZgkCFlAJ5hyCXXbkLbm9MJEjuZIaWvJLcIOABz3rXCNo+iRqxjc96RcPTUyBEHX2DS knKVuNw9xpy8sggfqJRERqPKj0foAnsDxuVT4HS1dKm9/opd8x06XjMhpr+ASjhwYnSg l6yFfq5PMlK3Zv3uEhIHRSoBx2fd/v8jYad1VmWEHi+7wztpZ3M+deJlKnmXwtX2W3Of zjiqRtJjDYYtZCt5nC8yuIpd1zY8LPcZvLlNs+EBv6pyg83lPRN8jjNTLklQMXnNDusS dV7uDv/gAjpCB+BIGSq87FHNrya8rxqh3ozMu4cVXHSchATE/lyWitA1G0uwjmzpPGr3 AW7g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686006540; x=1688598540; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=hP1a+zSS6By6YHFWWESMz0z0ScU/pBzOH1GhvDFLrYA=; b=LEE7xW//H/4vBMHyXe2sC/oaZRLO7blEj4+UHR7tU6VbqOHlTs8w3yxiFjM8u0kkhL g2+6nNnhFxKG3Yt/3+BxNu53G9nHHhPHQ7aH6HyTecFkHgJYdfF3C/7ZEG4NiTM0iGmz +DqsBVCVc54VHzvCnjcaD4k5k1RrUkQNlcXk5YpnU4fTd00ka9bMs3r5MySsfPMgim+3 rWoHmXvaLZzrEkY26Rmbnt9x01SE9V4xEl658WdltV4wDpfLMSfB7FpS+QpiEwmM8n5B t9eVLmbYOs6UT3ElqwTv6BZ6d558IxdcOtwd+dKANuiB4xXF5+nTfsFzwcswHqC6eeNQ MC6g== X-Gm-Message-State: AC+VfDwDRFEhpEjTA9dUin9SVcuuv3TBfN1fFpb9vCeohM43ZA7AaoTZ d7jE79HpViygj6+9FbrBUrHkkthFTxg= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a17:902:7c0d:b0:1ac:618a:6d46 with SMTP id x13-20020a1709027c0d00b001ac618a6d46mr100851pll.3.1686006540390; Mon, 05 Jun 2023 16:09:00 -0700 (PDT) Date: Mon, 5 Jun 2023 16:08:59 -0700 In-Reply-To: <87pm694jmg.ffs@tglx> Mime-Version: 1.0 References: <20230603193439.502645149@linutronix.de> <87pm694jmg.ffs@tglx> Message-ID: Subject: Re: [patch 0/6] Cure kexec() vs. mwait_play_dead() troubles From: Sean Christopherson To: Thomas Gleixner Cc: LKML , x86@kernel.org, Ashok Raj , Dave Hansen , Tony Luck , Arjan van de Veen , Peter Zijlstra , Eric Biederman Content-Type: text/plain; charset="us-ascii" X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jun 06, 2023, Thomas Gleixner wrote: > On Mon, Jun 05 2023 at 10:41, Sean Christopherson wrote: > > On Sat, Jun 03, 2023, Thomas Gleixner wrote: > >> This is only half safe because HLT can resume execution due to NMI, SMI and > >> MCE. Unfortunately there is no real safe mechanism to "park" a CPU reliably, > > > > On Intel. On AMD, enabling EFER.SVME and doing CLGI will block everything except > > single-step #DB (lol) and RESET. #MC handling is implementation-dependent and > > *might* cause shutdown, but at least there's a chance it will work. And presumably > > modern CPUs do pend the #MC until GIF=1. > > Abusing SVME for that is definitely in the realm of creative bonus > points, but not necessarily a general purpose solution. Heh, my follow-up ideas for Intel are to abuse XuCode or SEAM ;-) > >> So parking them via INIT is not completely solving the problem, but it > >> takes at least NMI and SMI out of the picture. > > > > Don't most SMM handlers rendezvous all CPUs? I.e. won't blocking SMIs indefinitely > > potentially cause problems too? > > Not that I'm aware of. If so then this would be a hideous firmware bug > as firmware must be aware of CPUs which hang around in INIT independent > of this. I was thinking of the EDKII code in UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c, e.g. SmmWaitForApArrival(). I've never dug deeply into how EDKII uses SMM, what its timeouts are, etc., I just remember coming across that code when poking around EDKII for other stuff. > > Why not carve out a page that's hidden across kexec() to hold whatever code+data > > is needed to safely execute a HLT loop indefinitely? > > See below. > > > E.g. doesn't the original kernel provide the e820 tables for the > > post-kexec() kernel? > > Only for crash kernels if I'm not missing something. Ah, drat. > Making this work for regular kexec() including this: > > > To avoid OOM after many kexec(), reserving a page could be done iff > > the current kernel wasn't itself kexec()'d. > > would be possible and I thought about it, but that needs a complete new > design of "offline", "shutdown offline" and a non-trivial amount of > backwards compatibility magic because you can't assume that the kexec() > kernel version is greater or equal to the current one. kexec() is > supposed to work both ways, downgrading and upgrading. IOW, that ship > sailed long ago. Right, but doesn't gaining "full" protection require ruling out unenlightened downgrades? E.g. if someone downgrades to an old kernel, doesn't hide the "offline" CPUs from the kexec() kernel, and boots the old kernel with -nosmt or whatever, then that old kernel will do the naive MWAIT or unprotected HLT and it's hosed again. If we're relying on the admin to hide the offline CPUs, could we usurp an existing kernel param to hide a small chunk of memory instead?