Received: by 2002:a05:6359:c8b:b0:c7:702f:21d4 with SMTP id go11csp4667866rwb; Tue, 20 Sep 2022 18:16:06 -0700 (PDT) X-Google-Smtp-Source: AMsMyM5YJT88vFMFtZ+5XmfKfIh6j0HFm3ayVz8nvS1ZaFvgkmbYd1RYJFeyMfNVoLNzyncKlt+W X-Received: by 2002:a17:90b:1b4c:b0:202:c1a3:25ce with SMTP id nv12-20020a17090b1b4c00b00202c1a325cemr6981397pjb.232.1663722965918; Tue, 20 Sep 2022 18:16:05 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1663722965; cv=none; d=google.com; s=arc-20160816; b=GWmVqpG4sZ2AZptIwBJTXQcC5z7sqPRC2lLYTvvSZjQbGguJFKB+HANajxWBfIo5OW j6FEU/Oihd8ENNQA5DY6T2O31HdiriqjleppuSPCQtpURj6WfFJ+5EI6YTpzSuXZrJ/a 5xdwAVqAbGqkjxQ+eSxEVIlHlYdreOzrOaPLXdnz3QtP4K3WiQ/3T/KG+QKutRDXmB/l PUZvNjPuIYlw1YSLyrhz0RYOw5eyERK5VCbkwtTTBuBZp47+4VLoz1Hf5QAToz7OfQ3D GMYEDAO25vi/iXlZbvNIKqm/4bJxoiWTE1ITWsLcgNjPw0oHhMKLFFy9/JP9AH/SW1LJ WwNw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:reply-to:dkim-signature; bh=+nmtDj/EH/uK0SyJnyCY1A9LxFaUnx5384By8rF2AcA=; b=ZN/4raHHbhLgtKt38ST2nqdf1Wv1YpuzYN/kb4nLmu+J5lt9252V0x2HuH228bnb4E Sx6UzrzAVeMIMyf/pIIz5JVdhLIiTi/+F+t5qmaKd4f/9wc5qyy+WH3ss0mM0dRyTceR fn7bBpGjcY8c2byR6pYoHG51b9DfjrSgS+CDNMGEkMNQyZnSekgpIK66G+NJxCnBlB// KZZkiY2T3hw3D1/Y6OuNCSzwmfAHpepj4Ec40SBI9aLeydeuoYWYhs9gmvNKp5oMHZnl BJ5EC+wd+c9eMxJtnxMP7LUK0ZdMNwclugnKL5X1UbHakvv0I1rPu0H8PxJBrOkCjPL/ /Cww== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b="rpWC/VKj"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id h15-20020a63c00f000000b004391c298544si1390459pgg.60.2022.09.20.18.15.52; Tue, 20 Sep 2022 18:16:05 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b="rpWC/VKj"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231546AbiIUAdt (ORCPT + 99 others); Tue, 20 Sep 2022 20:33:49 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34288 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231462AbiIUAdN (ORCPT ); Tue, 20 Sep 2022 20:33:13 -0400 Received: from mail-pj1-x104a.google.com (mail-pj1-x104a.google.com [IPv6:2607:f8b0:4864:20::104a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5426B79692 for ; Tue, 20 Sep 2022 17:33:03 -0700 (PDT) Received: by mail-pj1-x104a.google.com with SMTP id pc10-20020a17090b3b8a00b00202be8d81d2so7385016pjb.1 for ; Tue, 20 Sep 2022 17:33:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date; bh=+nmtDj/EH/uK0SyJnyCY1A9LxFaUnx5384By8rF2AcA=; b=rpWC/VKjogxaSKricNAD/sRggDf2m+ppcZ99cC4f6M2lXn9866lNJfALVMtFhxatIn qQLLRO1BUtEoQdRvY1yCuXD/CtkIEdb3+zCRL0TNiOdne1CQBhI/BizUOruqlzoGL1ru VCaVy6ynkLdl8Y2SlQn7F72+sRnXKS9nW+9M+WtquUBaCslXgLIbNwAo/KbGQ39Ut11+ iX38zqjv00zbeQfFOy8nLz7/nmQA6SCthL08Bvu2Mg/I39d68rA1efgAA/OI3nSBX2ld cXZNYtWAos3dWx8qB/6lW1KWT1k2gQ6b3mCS+VL9cYwYuehjdDCD0p4cY7hCsI3NeNJV wlPQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date; bh=+nmtDj/EH/uK0SyJnyCY1A9LxFaUnx5384By8rF2AcA=; b=vqO+6uFuPspKnhnHQnRJkCS9GeI5KyHOS3XWKiF5QqYfPqH0seluipe3naxPQOs2gY fypyr62Ydve0keHGvfZmcdhuAZvhnrpr/d1O3UD80qLZ5IKh+Y2Eqo2eXtIK0eunh59z HL3zjMiGt1ht6Fv2NUsVBAAzO7V49X2cJMUZyJ5lejif8ekpBhmnPGhIrBBVDYxVHgqA Pk/PpW9nnSR2MSgYjUeAenQOhXHJ5ABPCizo6ItqoQ211wP5XgozujtnvCcJRkWDaL3B Pq22LvVh5YpJY0QrD9Js7BWQxYPYQT57m7TBHRND//izFgAnfhkI+opBH/Uz6+4puzEi jcuw== X-Gm-Message-State: ACrzQf0w7blIhmreAllXYA4SlqBpqur3RMvH1ab61YgeJx2I9YaBM6iu +oWVxpxxV8oOByQVkVstG/zyqI9KGy0= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a17:90b:4a43:b0:202:7706:73d7 with SMTP id lb3-20020a17090b4a4300b00202770673d7mr6477790pjb.137.1663720382986; Tue, 20 Sep 2022 17:33:02 -0700 (PDT) Reply-To: Sean Christopherson Date: Wed, 21 Sep 2022 00:31:58 +0000 In-Reply-To: <20220921003201.1441511-1-seanjc@google.com> Mime-Version: 1.0 References: <20220921003201.1441511-1-seanjc@google.com> X-Mailer: git-send-email 2.37.3.968.ga6b4b080e4-goog Message-ID: <20220921003201.1441511-10-seanjc@google.com> Subject: [PATCH v4 09/12] KVM: x86: Don't snapshot pending INIT/SIPI prior to checking nested events From: Sean Christopherson To: Paolo Bonzini , Marc Zyngier , Huacai Chen , Aleksandar Markovic , Anup Patel , Paul Walmsley , Palmer Dabbelt , Albert Ou , Christian Borntraeger , Janosch Frank , Claudio Imbrenda , Sean Christopherson Cc: James Morse , Alexandru Elisei , Suzuki K Poulose , Oliver Upton , Atish Patra , David Hildenbrand , kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.cs.columbia.edu, linux-mips@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, kvm-riscv@lists.infradead.org, linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org, Maxim Levitsky Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Don't snapshot pending INIT/SIPI events prior to checking nested events, architecturally there's nothing wrong with KVM processing (dropping) a SIPI that is received immediately after synthesizing a VM-Exit. Taking and consuming the snapshot makes the flow way more subtle than it needs to be, e.g. nVMX consumes/clears events that trigger VM-Exit (INIT/SIPI), and so at first glance it appears that KVM is double-dipping on pending INITs and SIPIs. But that's not the case because INIT is blocked unconditionally in VMX root mode the CPU cannot be in wait-for_SIPI after VM-Exit, i.e. the paths that truly consume the snapshot are unreachable if apic->pending_events is modified by kvm_check_nested_events(). nSVM is a similar story as GIF is cleared by the CPU on VM-Exit; INIT is blocked regardless of whether or not it was pending prior to VM-Exit. Drop the snapshot logic so that a future fix doesn't create weirdness when kvm_vcpu_running()'s call to kvm_check_nested_events() is moved to vcpu_block(). In that case, kvm_check_nested_events() will be called immediately before kvm_apic_accept_events(), which raises the obvious question of why that change doesn't break the snapshot logic. Note, there is a subtle functional change. Previously, KVM would clear pending SIPIs if and only SIPI was pending prior to VM-Exit, whereas now KVM clears pending SIPI unconditionally if INIT+SIPI are blocked. The latter is architecturally allowed, as SIPI is ignored if the CPU is not in wait-for-SIPI mode (arguably, KVM should be even more aggressive in dropping SIPIs). It is software's responsibility to ensure the SIPI is delivered, i.e. software shouldn't be firing INIT-SIPI at a CPU until it knows with 100% certaining that the target CPU isn't in VMX root mode. Furthermore, the existing code is extra weird as SIPIs that arrive after VM-Exit _are_ dropped if there also happened to be a pending SIPI before VM-Exit. Signed-off-by: Sean Christopherson --- arch/x86/kvm/lapic.c | 36 ++++++++++-------------------------- 1 file changed, 10 insertions(+), 26 deletions(-) diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 2bd90effc653..d7639d126e6c 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -3025,17 +3025,8 @@ int kvm_apic_accept_events(struct kvm_vcpu *vcpu) struct kvm_lapic *apic = vcpu->arch.apic; u8 sipi_vector; int r; - unsigned long pe; - if (!lapic_in_kernel(vcpu)) - return 0; - - /* - * Read pending events before calling the check_events - * callback. - */ - pe = smp_load_acquire(&apic->pending_events); - if (!pe) + if (!kvm_apic_has_pending_init_or_sipi(vcpu)) return 0; if (is_guest_mode(vcpu)) { @@ -3043,38 +3034,31 @@ int kvm_apic_accept_events(struct kvm_vcpu *vcpu) if (r < 0) return r == -EBUSY ? 0 : r; /* - * If an event has happened and caused a vmexit, - * we know INITs are latched and therefore - * we will not incorrectly deliver an APIC - * event instead of a vmexit. + * Continue processing INIT/SIPI even if a nested VM-Exit + * occurred, e.g. pending SIPIs should be dropped if INIT+SIPI + * are blocked as a result of transitioning to VMX root mode. */ } /* - * INITs are blocked while CPU is in specific states - * (SMM, VMX root mode, SVM with GIF=0). - * Because a CPU cannot be in these states immediately - * after it has processed an INIT signal (and thus in - * KVM_MP_STATE_INIT_RECEIVED state), just eat SIPIs - * and leave the INIT pending. + * INITs are blocked while CPU is in specific states (SMM, VMX root + * mode, SVM with GIF=0), while SIPIs are dropped if the CPU isn't in + * wait-for-SIPI (WFS). */ if (!kvm_apic_init_sipi_allowed(vcpu)) { WARN_ON_ONCE(vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED); - if (test_bit(KVM_APIC_SIPI, &pe)) - clear_bit(KVM_APIC_SIPI, &apic->pending_events); + clear_bit(KVM_APIC_SIPI, &apic->pending_events); return 0; } - if (test_bit(KVM_APIC_INIT, &pe)) { - clear_bit(KVM_APIC_INIT, &apic->pending_events); + if (test_and_clear_bit(KVM_APIC_INIT, &apic->pending_events)) { kvm_vcpu_reset(vcpu, true); if (kvm_vcpu_is_bsp(apic->vcpu)) vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE; else vcpu->arch.mp_state = KVM_MP_STATE_INIT_RECEIVED; } - if (test_bit(KVM_APIC_SIPI, &pe)) { - clear_bit(KVM_APIC_SIPI, &apic->pending_events); + if (test_and_clear_bit(KVM_APIC_SIPI, &apic->pending_events)) { if (vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED) { /* evaluate pending_events before reading the vector */ smp_rmb(); -- 2.37.3.968.ga6b4b080e4-goog