Received: by 2002:a05:7412:2a8c:b0:e2:908c:2ebd with SMTP id u12csp3577347rdh; Thu, 28 Sep 2023 16:36:57 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFZlxYkYt9hWfQYRuxG4aGNlx/+nxfuugji74J4wcSlyRhAjP7o9vUnprwoes9oSm/eLcIO X-Received: by 2002:a05:6a00:24c5:b0:68f:dcc1:4bef with SMTP id d5-20020a056a0024c500b0068fdcc14befmr3411109pfv.7.1695944217721; Thu, 28 Sep 2023 16:36:57 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1695944217; cv=none; d=google.com; s=arc-20160816; b=KFrP1/avqv9mGmrbMZy8MxMde2WwAC66ESn5I4s/Q8aHalct/zrbDsUliApVAsENSi ELSnrEK5bkmTJ2vN7Lyez7EkoZSbd8cEeFBOwudbsqtLZT1VzsLhPlasZhCcpzv3keu3 nWYcE3di6c2Kg8beeDSYoQlfZO2fmLIKCn/gmItFRPeY/61Y17I5HGt5iMsWrj/SENtc eQ1CXoydlT61LUmb5NjeCdxNjBxHaa55KWCo+z39jr0A+PtkeX4davQkY0IWOALQcpN6 a4Mz5Piutw10vh9bacrJlyNrjbqLHYVDS5dHzXvjFmsqw/iy9PArCjIHv0W8Sv1cnwtW Cfvw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=BNW1QOOJZvi4+G5BXUnncgpsE3RlMF02WmB0kRHObXg=; fh=9LbWkm2mGYyT4VgR7tbmPx1+hF6afcdiHJr0enuOIz0=; b=gtmRiTNhKJ9ZVjR7RwCRzh2De9XR856M30k5kDZXGavC4Np61BmchE698UCCfdgQ1O MgPRVVEhx4r6UF+g7JqOeBHHSLjXKQLR7TM8+C/aLThtNNrcuQmSVemTxMuU2C3ntWm7 YQLr2SIo20bLwzJaX0BKkd5QPYR/TQwsPa93YfnIEMD3qrmkJfCu6c1YYB329BcLEb9+ DFsLtU2u50xPx27JsGN3pBMYGXd5FI/kJGFZTjdW/CPNTdvCUSeAwBTR3HV61xPjNQgQ 4PCUivsouFxtuU2zAhfva1fZu0OIPqQ4idksPT04R6MoFjQV/e2lP4AOw6zeUvYdsGQE DEpw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=gy1IeaU6; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:3 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from lipwig.vger.email (lipwig.vger.email. [2620:137:e000::3:3]) by mx.google.com with ESMTPS id cw16-20020a056a00451000b0068fef323e54si19309142pfb.296.2023.09.28.16.36.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 28 Sep 2023 16:36:57 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:3 as permitted sender) client-ip=2620:137:e000::3:3; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=gy1IeaU6; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:3 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by lipwig.vger.email (Postfix) with ESMTP id 5B248829EA1A; Thu, 28 Sep 2023 08:06:47 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at lipwig.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231695AbjI1PGO (ORCPT + 99 others); Thu, 28 Sep 2023 11:06:14 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46998 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231653AbjI1PGK (ORCPT ); Thu, 28 Sep 2023 11:06:10 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CDD161B0 for ; Thu, 28 Sep 2023 08:05:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1695913520; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=BNW1QOOJZvi4+G5BXUnncgpsE3RlMF02WmB0kRHObXg=; b=gy1IeaU6C5+swcBiYoJklGwsalsl8KYtL94826stvQp5t65A9mcEXu2QL3jzEY0T3n6Ew8 N5IvEKiHWvWPBYRXiWrL9P/R/JGHBic9GDSvpK8JUcsK3b636J3fz6ZJgA3uEBYd2DThAB YgxzVIx6pwpSBHbOtPK8UkSP9Vc7Ypc= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-475-HDbD2tdKM2KB0zSaX0me8w-1; Thu, 28 Sep 2023 11:04:53 -0400 X-MC-Unique: HDbD2tdKM2KB0zSaX0me8w-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 3B7C2101A550; Thu, 28 Sep 2023 15:04:52 +0000 (UTC) Received: from localhost.localdomain (unknown [10.45.226.141]) by smtp.corp.redhat.com (Postfix) with ESMTP id E7DEB40C6E76; Thu, 28 Sep 2023 15:04:48 +0000 (UTC) From: Maxim Levitsky To: kvm@vger.kernel.org Cc: Will Deacon , Borislav Petkov , Dave Hansen , Suravee Suthikulpanit , Thomas Gleixner , Paolo Bonzini , x86@kernel.org, Robin Murphy , iommu@lists.linux.dev, Ingo Molnar , Joerg Roedel , Sean Christopherson , "H. Peter Anvin" , linux-kernel@vger.kernel.org, Maxim Levitsky Subject: [PATCH 5/5] x86: KVM: SVM: workaround for AVIC's errata #1235 Date: Thu, 28 Sep 2023 18:04:28 +0300 Message-Id: <20230928150428.199929-6-mlevitsk@redhat.com> In-Reply-To: <20230928150428.199929-1-mlevitsk@redhat.com> References: <20230928150428.199929-1-mlevitsk@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 3.1 on 10.11.54.2 X-Spam-Status: No, score=-0.9 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lipwig.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (lipwig.vger.email [0.0.0.0]); Thu, 28 Sep 2023 08:06:47 -0700 (PDT) On Zen2 (and likely on Zen1 as well), AVIC doesn't reliably detect a change in the 'is_running' bit during ICR write emulation and might skip a VM exit, if that bit was recently cleared. The absence of the VM exit, leads to the KVM not waking up / triggering nested vm exit on the target(s) of the IPI which can, in some cases, lead to an unbounded delays in the guest execution. As I recently discovered, a reasonable workaround exists: make the KVM never set the is_running bit. This workaround ensures that (*) all ICR writes always cause a VM exit and therefore correctly emulated, in expense of never enjoying VM exit-less ICR emulation. This workaround does carry a performance penalty but according to my benchmarks is still much better than not using AVIC at all, because AVIC is still used for the receiving end of the IPIs, and for the posted interrupts. If the user is aware of the errata and it doesn't affect his workload, the user can disable the workaround with 'avic_zen2_errata_workaround=0' (*) More correctly all ICR writes except when 'Self' shorthand is used: In this case AVIC skips reading physid table and just sets bits in IRR of local APIC. Thankfully in this case, the errata is not possible, therefore an extra workaround for this case is not needed. Signed-off-by: Maxim Levitsky --- arch/x86/kvm/svm/avic.c | 50 +++++++++++++++++++++++++++++------------ 1 file changed, 36 insertions(+), 14 deletions(-) diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c index c44b65af494e3ff..df9efa428f86aa9 100644 --- a/arch/x86/kvm/svm/avic.c +++ b/arch/x86/kvm/svm/avic.c @@ -62,6 +62,9 @@ static_assert(__AVIC_GATAG(AVIC_VM_ID_MASK, AVIC_VCPU_ID_MASK) == -1u); static bool force_avic; module_param_unsafe(force_avic, bool, 0444); +static int avic_zen2_errata_workaround = -1; +module_param(avic_zen2_errata_workaround, int, 0444); + /* Note: * This hash table is used to map VM_ID to a struct kvm_svm, * when handling AMD IOMMU GALOG notification to schedule in @@ -1027,7 +1030,6 @@ avic_update_iommu_vcpu_affinity(struct kvm_vcpu *vcpu, int cpu, bool r) void avic_vcpu_load(struct kvm_vcpu *vcpu, int cpu) { - u64 entry; int h_physical_id = kvm_cpu_get_apicid(cpu); struct vcpu_svm *svm = to_svm(vcpu); unsigned long flags; @@ -1056,14 +1058,18 @@ void avic_vcpu_load(struct kvm_vcpu *vcpu, int cpu) */ spin_lock_irqsave(&svm->ir_list_lock, flags); - entry = READ_ONCE(*(svm->avic_physical_id_cache)); - WARN_ON_ONCE(entry & AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK); + if (!avic_zen2_errata_workaround) { + u64 entry = READ_ONCE(*(svm->avic_physical_id_cache)); - entry &= ~AVIC_PHYSICAL_ID_ENTRY_HOST_PHYSICAL_ID_MASK; - entry |= (h_physical_id & AVIC_PHYSICAL_ID_ENTRY_HOST_PHYSICAL_ID_MASK); - entry |= AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK; + WARN_ON_ONCE(entry & AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK); + + entry &= ~AVIC_PHYSICAL_ID_ENTRY_HOST_PHYSICAL_ID_MASK; + entry |= (h_physical_id & AVIC_PHYSICAL_ID_ENTRY_HOST_PHYSICAL_ID_MASK); + entry |= AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK; + + WRITE_ONCE(*(svm->avic_physical_id_cache), entry); + } - WRITE_ONCE(*(svm->avic_physical_id_cache), entry); avic_update_iommu_vcpu_affinity(vcpu, h_physical_id, true); spin_unlock_irqrestore(&svm->ir_list_lock, flags); @@ -1071,7 +1077,7 @@ void avic_vcpu_load(struct kvm_vcpu *vcpu, int cpu) void avic_vcpu_put(struct kvm_vcpu *vcpu) { - u64 entry; + u64 entry = 0; struct vcpu_svm *svm = to_svm(vcpu); unsigned long flags; @@ -1084,11 +1090,13 @@ void avic_vcpu_put(struct kvm_vcpu *vcpu) * can't be scheduled out and thus avic_vcpu_{put,load}() can't run * recursively. */ - entry = READ_ONCE(*(svm->avic_physical_id_cache)); - /* Nothing to do if IsRunning == '0' due to vCPU blocking. */ - if (!(entry & AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK)) - return; + if (!avic_zen2_errata_workaround) { + /* Nothing to do if IsRunning == '0' due to vCPU blocking. */ + entry = READ_ONCE(*(svm->avic_physical_id_cache)); + if (!(entry & AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK)) + return; + } /* * Take and hold the per-vCPU interrupt remapping lock while updating @@ -1102,8 +1110,10 @@ void avic_vcpu_put(struct kvm_vcpu *vcpu) avic_update_iommu_vcpu_affinity(vcpu, -1, 0); - entry &= ~AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK; - WRITE_ONCE(*(svm->avic_physical_id_cache), entry); + if (!avic_zen2_errata_workaround) { + entry &= ~AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK; + WRITE_ONCE(*(svm->avic_physical_id_cache), entry); + } spin_unlock_irqrestore(&svm->ir_list_lock, flags); @@ -1217,5 +1227,17 @@ bool avic_hardware_setup(void) amd_iommu_register_ga_log_notifier(&avic_ga_log_notifier); + if (avic_zen2_errata_workaround == -1) { + + /* Assume that Zen1 and Zen2 have errata #1235 */ + if (boot_cpu_data.x86 == 0x17) + avic_zen2_errata_workaround = 1; + else + avic_zen2_errata_workaround = 0; + } + + if (avic_zen2_errata_workaround) + pr_info("Workaround for AVIC errata #1235 is enabled\n"); + return true; } -- 2.26.3