Received: by 2002:a05:7412:8d10:b0:f3:1519:9f41 with SMTP id bj16csp4128959rdb; Mon, 11 Dec 2023 09:34:37 -0800 (PST) X-Google-Smtp-Source: AGHT+IGW3ICEr4O7ykfN/r10HidzPaaSJmKJ3SXP2BXEorf3zdR46XpnjvW/OeQODfYq8G5V8WNU X-Received: by 2002:a17:902:cec7:b0:1d0:9c9c:b2c0 with SMTP id d7-20020a170902cec700b001d09c9cb2c0mr2343287plg.41.1702316077686; Mon, 11 Dec 2023 09:34:37 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1702316077; cv=none; d=google.com; s=arc-20160816; b=ZaUqJ5eRlK9E/70MxnW69NLjM6NnW8AJ0jKBFjwKfwArWnwsZ22dnu/WY8VFkC5XMR XMrzWpENZUdcDS44IoUpvVckB+E25i/gmGBHskzwZgasZ/n2j884zJoJGSsJL/huiRMH Y7x/jEa8LrUWvGIukDAiL+LGpScTqNPFE4pq77drDRBDzBdeiXAzLmwIVWj2GfxVN+zW 5LrFaP8cCJbb6y8fYlxjEhCNZdJEn4YrP//rkXpzdJaYyQNEn2j9+CCDdnpS5Rf7Wpfb B2r+ZDt87rTVOQeJyJbvqpFEicmMrOIo6lXG4ck0dfsiQy4k77fsHUvTChE3ldqwSsUX s5KQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:from:subject :message-id:references:mime-version:in-reply-to:date:dkim-signature; bh=4dy1GZPt4ahloA1lKMAAh2vw8dnz9eCUNT+pZwXAOsM=; fh=cPUxUoqZIFPT09GyZ4IUVzLlsuugMuLvnV0NamSMdsc=; b=rTGVkMacqbmivqwD0kSn/munIAIRQQu/9WWxgJT2Oyjb6USsz44v9KAVpfPRJwCiVF OmtaqIrECBe+x2jKirZML2gj6LiEL2CCMpPzH9tN/8fEpEmiyZf8uuPtLarFBUXBZaI+ M18X2yIAr5qZZTEJmT5kIuj5iBF1VoEff7eT/4rRQGVP4dG4wb10WgaZurAa26veGvt0 YIvRrbyCZf+T4RH9VdkcYVOlB2RNKbMH3GEkni+uIkbPN1vRYQivb3pFyDLZOTANcI2v Fqx+G4RQ0+psFR6RHgxxLN+zC/2XSrHjtijYRhPfkQKbZsAl/Fyk1kLH9L0002ZBEqa+ YENA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=PsrxwYGY; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:1 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from morse.vger.email (morse.vger.email. [2620:137:e000::3:1]) by mx.google.com with ESMTPS id i18-20020a17090332d200b001d0a96d328esi6437261plr.265.2023.12.11.09.34.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 11 Dec 2023 09:34:37 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:1 as permitted sender) client-ip=2620:137:e000::3:1; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=PsrxwYGY; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:1 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by morse.vger.email (Postfix) with ESMTP id 34B8A80725C5; Mon, 11 Dec 2023 09:34:35 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at morse.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1344683AbjLKReV (ORCPT + 99 others); Mon, 11 Dec 2023 12:34:21 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36436 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1344543AbjLKReU (ORCPT ); Mon, 11 Dec 2023 12:34:20 -0500 Received: from mail-pl1-x64a.google.com (mail-pl1-x64a.google.com [IPv6:2607:f8b0:4864:20::64a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C17EFC3 for ; Mon, 11 Dec 2023 09:34:26 -0800 (PST) Received: by mail-pl1-x64a.google.com with SMTP id d9443c01a7336-1d09a64eaebso41740045ad.3 for ; Mon, 11 Dec 2023 09:34:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1702316066; x=1702920866; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:from:to:cc:subject:date:message-id :reply-to; bh=4dy1GZPt4ahloA1lKMAAh2vw8dnz9eCUNT+pZwXAOsM=; b=PsrxwYGYkWHe0CkFvfaqqYjLelWXMpmqt8sKJnJ+TJR8yVwAIg9QBqI1Jk9k7cZjFV pqfxvWHWXcELwExNBrw9OIsUzU8lhC8lTc/A7FhgSXITD+9wVLoJY0bx67BVjrnmsWDJ 81jcP8+UHbwyvx3wDqVsy+JSJj0kUUeCLgm8YKj/18PeSRwUBh6xApeM5Ym3KXbpQQZs 26q2L6GtssE7w4jkpH0DqrlVpq+JDb7zPIHxgugASvFrouj0+f7BsI4caL2SdGmxKUjq V1hSOfdVa/9INaQhZbMsTVYkgXgNEEyZfkxs9gnG8wO6AAdC3Fyq5Y8JvPix+qwa1VOm mSIg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1702316066; x=1702920866; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=4dy1GZPt4ahloA1lKMAAh2vw8dnz9eCUNT+pZwXAOsM=; b=THSRzyi1u5ETpuw6gcyYpLM99O/BlSDYlDxNFSOAF7bKfYJlnH8eP+ncfGPi6lfK9U N6jUOkgINkNFnlIiea6JrjYin+HivWdl7GWU2+wPIqfKGVgxDhOD60WNOuHItEAw/lTr KCDTV2ReRDn+W0fR/vTSjoVlSzgP9kSBtRCU7TiHVWG8PQXfATeREWSm0WaXDfXUTPIB zpFQSSD3kG8NqDBMts5A00Rsr7ZejMdmjLmzOpjxn95EzQ6wdEFz8Y6wDaobKdiJQK3c +COYy3eDwfc+YmslQaPX/DfQEoykmSbzqmSXVOv2JbGvH2WykQFZrbBvpDo+qQxdtkoF 7ymg== X-Gm-Message-State: AOJu0YyzlYwzs4X8AyhhIzfOzR29YnodvKPOUb0gSisMUlu2iOaavtw4 5bEvv/Ql0KW35Ue+56B6i9aUqVwiCM4= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a17:903:41cf:b0:1d0:80cd:4c44 with SMTP id u15-20020a17090341cf00b001d080cd4c44mr36557ple.10.1702316066276; Mon, 11 Dec 2023 09:34:26 -0800 (PST) Date: Mon, 11 Dec 2023 09:34:24 -0800 In-Reply-To: Mime-Version: 1.0 References: <20230512233127.804012-1-seanjc@google.com> <20230512233127.804012-2-seanjc@google.com> Message-ID: Subject: Re: [PATCH v2 1/2] KVM: Use syscore_ops instead of reboot_notifier to hook restart/shutdown From: Sean Christopherson To: James Gowans Cc: "pbonzini@redhat.com" , Alexander Graf , "Jan =?utf-8?Q?Sch=C3=B6nherr?=" , "ebiederm@xmission.com" , "yuzenghui@huawei.com" , "atishp@atishpatra.org" , "kvm-riscv@lists.infradead.org" , "james.morse@arm.com" , "suzuki.poulose@arm.com" , "oliver.upton@linux.dev" , "chenhuacai@kernel.org" , "linux-kernel@vger.kernel.org" , "kvmarm@lists.linux.dev" , "maz@kernel.org" , "kvm@vger.kernel.org" , "aleksandar.qemu.devel@gmail.com" , "anup@brainfault.org" , "kexec@lists.infradead.org" Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-8.4 required=5.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE, USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on morse.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (morse.vger.email [0.0.0.0]); Mon, 11 Dec 2023 09:34:35 -0800 (PST) On Sat, Dec 09, 2023, James Gowans wrote: > Hi Sean, >=20 > Blast from the past but I've just been bitten by this patch when > rebasing across v6.4. >=20 > On Fri, 2023-05-12 at 16:31 -0700, Sean Christopherson wrote: > > Use syscore_ops.shutdown to disable hardware virtualization during a > > reboot instead of using the dedicated reboot_notifier so that KVM disab= les > > virtualization _after_ system_state has been updated.=C2=A0 This will a= llow > > fixing a race in KVM's handling of a forced reboot where KVM can end up > > enabling hardware virtualization between kernel_restart_prepare() and > > machine_restart(). >=20 > The issue is that, AFAICT, the syscore_ops.shutdown are not called when > doing a kexec. Reboot notifiers are called across kexec via: >=20 > kernel_kexec > kernel_restart_prepare > blocking_notifier_call_chain > kvm_reboot >=20 > So after this patch, KVM is not shutdown during kexec; if hardware virt > mode is enabled then the kexec hangs in exactly the same manner as you > describe with the reboot. >=20 > Some specific shutdown callbacks, for example IOMMU, HPET, IRQ, etc are > called in native_machine_shutdown, but KVM is not one of these. >=20 > Thoughts on possible ways to fix this: > a) go back to reboot notifiers > b) get kexec to call syscore_shutdown() to invoke all of these callbacks > c) Add a KVM-specific callback to native_machine_shutdown(); we only > need this for Intel x86, right? I don't like (c). VMX is the most sensitive/problematic, e.g. the whole bl= ocking of INIT thing, but SVM can also run afoul of EFER.SVME being cleared, and K= VM really=20 should leave virtualization enabled across kexec(), even if leaving virtual= ization enabled is relatively benign on other architectures. One more option would be: d) Add another sycore hook, e.g. syscore_kexec() specifically for this pat= h. > My slight preference is towards adding syscore_shutdown() to kexec, but > I'm not sure that's feasible. Adding kexec maintainers for input. In a vacuum, that'd be my preference too. It's the obvious choice IMO, e.g= . the kexec_image->preserve_context path does syscore_suspend() (and then resume(= ), so it's not completely uncharted territory. However, there's a rather big wrinkle in that not all of the existing .shut= down() implementations are obviously ok to call during kexec. Luckily, AFAICT the= re are very few users of the syscore .shutdown hook, so it's at least feasible to = go that route. x86's mce_syscore_shutdown() should be ok, and arguably is correct, e.g. I = don't see how leaving #MC reporting enabled across kexec can work. ledtrig_cpu_syscore_shutdown() is also likely ok and arguably correct. The interrupt controllers though? x86 disables IRQs at the very beginning = of machine_kexec(), so it's likely fine. But every other architecture? No cl= ue. E.g. PPC's default_machine_kexec() sends IPIs to shutdown other CPUs, thoug= h I have no idea if that can run afoul of any of the paths below. arch/powerpc/platforms/cell/spu_base.c .shutdown =3D spu_shutdown, arch/x86/kernel/cpu/mce/core.c .shutdown =3D mce_syscore_sh= utdown, arch/x86/kernel/i8259.c .shutdown =3D i8259A_shutdo= wn, drivers/irqchip/irq-i8259.c .shutdown =3D i8259A_shutdown, drivers/irqchip/irq-sun6i-r.c .shutdown =3D sun6i_r_intc_sh= utdown, drivers/leds/trigger/ledtrig-cpu.c .shutdown =3D ledtrig_cpu_syscor= e_shutdown, drivers/power/reset/sc27xx-poweroff.c .shutdown =3D sc27xx_poweroff= _shutdown, kernel/irq/generic-chip.c .shutdown =3D irq_gc_shutdown, virt/kvm/kvm_main.c .shutdown =3D kvm_shutdown, The whole thing is a bit of a mess. E.g. x86 treats machine_shutdown() fro= m kexec pretty much the same as shutdown for reboot, but other architectures = have what appear to be unique paths for handling kexec. FWIW, if we want to go with option (b), syscore_shutdown() hooks could use kexec_in_progress to differentiate between "regular" shutdown/reboot and ke= xec.