Received: by 2002:a05:6359:6284:b0:131:369:b2a3 with SMTP id se4csp3059179rwb; Mon, 7 Aug 2023 07:41:18 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGIrJ2dtVrqQqVRde6iri942mml4V20INMbLY1L16wReSKxJPGaEABL4/CKBTytfYBSaiGF X-Received: by 2002:a17:906:8a5a:b0:969:93f2:259a with SMTP id gx26-20020a1709068a5a00b0096993f2259amr7945533ejc.73.1691419277874; Mon, 07 Aug 2023 07:41:17 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1691419277; cv=none; d=google.com; s=arc-20160816; b=KNzcvLo4hvva+u4qPmaAuXlSJu9Pi9mBhObYAagJq8dtcS8bS9ZG6AdO9ALRZ51s55 Gq7Eb7TvvR+hbK0knDVdSnu72CTvDiVKpzWMKTNIs2sVP6RCMngQOm0JZFvZFxxkSnSS JBWNmK8lEew/zrT/HffXLIHWcGmXW6Rjnl6RYp+MHqVNRy3a3cx4D9nnQ1HKTGLYbHg3 AtHw4Kkcb2Vd9Jmx1UbxRn3frnjDHsGPfuyorRnMa+9I1FmAy2mS/gLfesG88fAw6Rpi AteS/QgK6y0EvDlrMzBfG7q7P0FU+iltA47IGN6z1lEjLn67sGqieEj9f2vpPwMPB35i 51CQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:date:cc:to:from:subject :message-id:dkim-signature; bh=+Qn1be9pgxXRNYtcAeFBlVlvM6Ax4kbHeWCXxtkTfV8=; fh=s3aD/qWwRX6jpQyZp5yunaf2j4nGa66P+sZtql3jhaE=; b=G75VcuiVGjpJkkZG9yj4grj3gC2TIfT2l24sS9jh8TyTsYveIs21vLUPcQIlP8rwAk v5FsJhZrKaWl+wdL8JN2FB4TfmO0XdsIdMOyLDJ2+1xGvn5pDnjQJGzPcFPoWt4YroAJ 97CbBuuNid6Hktu0jm6EhLsunFOm4K5Ng5syg5h3nUfz4/xpEFttLSNi1xRBIJfg3Geq aJGQprEt8DDpXlBJU12FYmnC42Lc99AImEzkf6s9jTfG41ZaYolPSeefwh3LfHxJvjNO 5OzMqgH6w+Odf/T4NSFLZAohMkqXfzWT7rTJIolQeVHf5IUe5fAXC8TnU4I0L7uVfoNO plYQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=B+v61c5O; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id ec22-20020a170906b6d600b0098b0feb8935si5954392ejb.1035.2023.08.07.07.40.52; Mon, 07 Aug 2023 07:41:17 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=B+v61c5O; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233459AbjHGOEW (ORCPT + 99 others); Mon, 7 Aug 2023 10:04:22 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47924 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235053AbjHGODh (ORCPT ); Mon, 7 Aug 2023 10:03:37 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 94FC22105 for ; Mon, 7 Aug 2023 07:01:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1691416864; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=+Qn1be9pgxXRNYtcAeFBlVlvM6Ax4kbHeWCXxtkTfV8=; b=B+v61c5ObCCEQV4MO6Gzs7Chhy9V5eM4Ta4JXm36KlPMmSGhcHsFSMNucqcvw7cbFYsyW4 JKEgrHshmLRNICpiLAUeGVaRh3n3jjvMUGzqhaeySrB07KpLaabM76b8MuvMKXWuAhlyvd AswBjwaiub/Oqv2y8XEjQjuTQWumflI= Received: from mail-ej1-f71.google.com (mail-ej1-f71.google.com [209.85.218.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-298-4qCY-BafP8GWPtBSavpPXQ-1; Mon, 07 Aug 2023 10:01:02 -0400 X-MC-Unique: 4qCY-BafP8GWPtBSavpPXQ-1 Received: by mail-ej1-f71.google.com with SMTP id a640c23a62f3a-99beea69484so346869566b.0 for ; Mon, 07 Aug 2023 07:01:02 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691416861; x=1692021661; h=content-transfer-encoding:mime-version:user-agent:references :in-reply-to:date:cc:to:from:subject:message-id:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=+Qn1be9pgxXRNYtcAeFBlVlvM6Ax4kbHeWCXxtkTfV8=; b=HkIzECADsyQg9w5BHZNM4rbXm12CWRLRrKtI4pH+GzE7Pfly1ZnZmnfzpxBwzouZlH fqb2E2Wtb87ALwZNS0E8DeH0TGJTCJ8qbuK1MuUyfIuqejfxdHcaBaesqa11clLhdIG/ Vxzsf3/0INrLsrTHIaPRgZrXuo2nza3p8HZ7wUdWnrI51pfTG6TkFtncRiV2xS731KrN PE6VG4OSOR67I+TdTpAGpzvwWdN4lMgn4paL3AsbMT9dxGa7VeoTFIpEsOH5nbT4SXz6 ++y7yfz/mpuys0tG8YUerY08zzH1xBJJg8iQ425JbyVgA8ty5XVmjuOrPyWVtzkpJWbf ousQ== X-Gm-Message-State: AOJu0YxC+5AVMttUmxHmWQbVPxMgMKj3kOg2Zyr2hDkkFmJUv52R+lZW eDKbBj2yN8VkN8XvF0kIFL5xNZm1YvZnFyU6bpmLPOm/gT8nQPm58oKj0REFvFWDj3VJb1tFQtg GX0VZJapJocbIE3eAAA9Pec60 X-Received: by 2002:a17:906:109:b0:992:c5ad:18bc with SMTP id 9-20020a170906010900b00992c5ad18bcmr8728053eje.70.1691416861580; Mon, 07 Aug 2023 07:01:01 -0700 (PDT) X-Received: by 2002:a17:906:109:b0:992:c5ad:18bc with SMTP id 9-20020a170906010900b00992c5ad18bcmr8728035eje.70.1691416861277; Mon, 07 Aug 2023 07:01:01 -0700 (PDT) Received: from starship ([77.137.131.138]) by smtp.gmail.com with ESMTPSA id l7-20020a1709066b8700b0099c53c4407dsm5278038ejr.78.2023.08.07.07.00.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Aug 2023 07:01:00 -0700 (PDT) Message-ID: <43c18a3d57305cf52a1c3643fa8f714ae3769551.camel@redhat.com> Subject: Re: [RFC PATCH] KVM: x86: inhibit APICv upon detecting direct APIC access from L2 From: Maxim Levitsky To: Ake Koomsin , kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Sean Christopherson , Paolo Bonzini , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H . Peter Anvin" Date: Mon, 07 Aug 2023 17:00:58 +0300 In-Reply-To: <20230807062611.12596-1-ake@igel.co.jp> References: <20230807062611.12596-1-ake@igel.co.jp> Content-Type: text/plain; charset="UTF-8" User-Agent: Evolution 3.36.5 (3.36.5-2.fc32) MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, RCVD_IN_DNSWL_BLOCKED,RCVD_IN_MSPIKE_H4,RCVD_IN_MSPIKE_WL, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org У пн, 2023-08-07 у 15:26 +0900, Ake Koomsin пише: > Current KVM does not expect L1 hypervisor to allow L2 guest to access > APIC page directly when APICv is enabled. When this happens, KVM > emulates the access itself resulting in interrupt lost. > > As this kind of hypervisor is rare, it is simpler to inhibit APICv upon > detecting direct APIC access from L2 to avoid unexpected interrupt lost. > > Signed-off-by: Ake Koomsin > --- > arch/x86/include/asm/kvm_host.h | 6 ++++++ > arch/x86/kvm/mmu/mmu.c | 33 ++++++++++++++++++++++++++------- > arch/x86/kvm/svm/svm.h | 3 ++- > arch/x86/kvm/vmx/vmx.c | 3 ++- > 4 files changed, 36 insertions(+), 9 deletions(-) > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h > index 3bc146dfd38d..8764b11922a0 100644 > --- a/arch/x86/include/asm/kvm_host.h > +++ b/arch/x86/include/asm/kvm_host.h > @@ -1188,6 +1188,12 @@ enum kvm_apicv_inhibit { > APICV_INHIBIT_REASON_APIC_ID_MODIFIED, > APICV_INHIBIT_REASON_APIC_BASE_MODIFIED, > > + /* > + * APICv is disabled because L1 hypervisor allows L2 guest to access > + * APIC directly. > + */ > + APICV_INHIBIT_REASON_L2_PASSTHROUGH_ACCESS, > + > /******************************************************/ > /* INHIBITs that are relevant only to the AMD's AVIC. */ > /******************************************************/ > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > index ec169f5c7dce..c1150ef9fce1 100644 > --- a/arch/x86/kvm/mmu/mmu.c > +++ b/arch/x86/kvm/mmu/mmu.c > @@ -4293,6 +4293,30 @@ void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu, struct kvm_async_pf *work) > kvm_mmu_do_page_fault(vcpu, work->cr2_or_gpa, 0, true, NULL); > } > > +static int __kvm_faultin_pfn_guest_mode(struct kvm_vcpu *vcpu, > + struct kvm_page_fault *fault) > +{ > + struct kvm_memory_slot *slot = fault->slot; > + > + /* Don't expose private memslots to L2. */ > + fault->slot = NULL; > + fault->pfn = KVM_PFN_NOSLOT; > + fault->map_writable = false; > + > + /* > + * APICv does not work when L1 hypervisor allows L2 guest to access > + * APIC directly. As this kind of L1 hypervisor is rare, it is simpler > + * to inhibit APICv when we detect direct APIC access from L2, and > + * fallback to emulation path to avoid interrupt lost. > + */ > + if (unlikely(slot && slot->id == APIC_ACCESS_PAGE_PRIVATE_MEMSLOT && > + kvm_apicv_activated(vcpu->kvm))) > + kvm_set_apicv_inhibit(vcpu->kvm, > + APICV_INHIBIT_REASON_L2_PASSTHROUGH_ACCESS); Is there a good reason why KVM doesn't expose APIC memslot to a nested guest? While nested guest runs, the L1's APICv is "inhibited" effectively anyway, so writes to this memslot should update APIC registers and be picked up by APICv hardware when L1 resumes execution. Since APICv alows itself to be inhibited due to other reasons, it means that just like AVIC, it should be able to pick up arbitrary changes to APIC registers which happened while it was inhibited, just like AVIC does. I'll take a look at the code to see if APICv does this (I know AVIC's code much better that APICv's) Is there a reproducer for this bug? Best regards, Maxim Levitsky > + > + return RET_PF_CONTINUE; > +} > + > static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) > { > struct kvm_memory_slot *slot = fault->slot; > @@ -4307,13 +4331,8 @@ static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault > return RET_PF_RETRY; > > if (!kvm_is_visible_memslot(slot)) { > - /* Don't expose private memslots to L2. */ > - if (is_guest_mode(vcpu)) { > - fault->slot = NULL; > - fault->pfn = KVM_PFN_NOSLOT; > - fault->map_writable = false; > - return RET_PF_CONTINUE; > - } > + if (is_guest_mode(vcpu)) > + return __kvm_faultin_pfn_guest_mode(vcpu, fault); > /* > * If the APIC access page exists but is disabled, go directly > * to emulation without caching the MMIO access or creating a > diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h > index 18af7e712a5a..8d77932ee0fb 100644 > --- a/arch/x86/kvm/svm/svm.h > +++ b/arch/x86/kvm/svm/svm.h > @@ -683,7 +683,8 @@ extern struct kvm_x86_nested_ops svm_nested_ops; > BIT(APICV_INHIBIT_REASON_PHYSICAL_ID_ALIASED) | \ > BIT(APICV_INHIBIT_REASON_APIC_ID_MODIFIED) | \ > BIT(APICV_INHIBIT_REASON_APIC_BASE_MODIFIED) | \ > - BIT(APICV_INHIBIT_REASON_LOGICAL_ID_ALIASED) \ > + BIT(APICV_INHIBIT_REASON_LOGICAL_ID_ALIASED) | \ > + BIT(APICV_INHIBIT_REASON_L2_PASSTHROUGH_ACCESS) \ > ) > > bool avic_hardware_setup(void); > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c > index df461f387e20..f652397c9765 100644 > --- a/arch/x86/kvm/vmx/vmx.c > +++ b/arch/x86/kvm/vmx/vmx.c > @@ -8189,7 +8189,8 @@ static void vmx_hardware_unsetup(void) > BIT(APICV_INHIBIT_REASON_BLOCKIRQ) | \ > BIT(APICV_INHIBIT_REASON_PHYSICAL_ID_ALIASED) | \ > BIT(APICV_INHIBIT_REASON_APIC_ID_MODIFIED) | \ > - BIT(APICV_INHIBIT_REASON_APIC_BASE_MODIFIED) \ > + BIT(APICV_INHIBIT_REASON_APIC_BASE_MODIFIED) | \ > + BIT(APICV_INHIBIT_REASON_L2_PASSTHROUGH_ACCESS) \ > ) > > static void vmx_vm_destroy(struct kvm *kvm)