Received: by 2002:a05:6a10:144:0:0:0:0 with SMTP id 4csp660221pxw; Fri, 8 Apr 2022 18:29:31 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwTuukafidwYipRnOdJPBraZeaPjfqquIiFfnRuPApmGaAseWOjqcMyrlM02w9wS32TzAn1 X-Received: by 2002:a17:906:6a21:b0:6e8:6cad:23f9 with SMTP id qw33-20020a1709066a2100b006e86cad23f9mr1446754ejc.75.1649467770879; Fri, 08 Apr 2022 18:29:30 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1649467770; cv=none; d=google.com; s=arc-20160816; b=QlRSTjvho1PfYZWycIvYm3qpdh+D1BZI4ddBAeAKwpesnxISDyjpxHCdl0KrDs6+1a P0PtwiXZkM5mFDUquL50pbJyRir+/iRPqWs9q5vLozIuz0z4Dz0BYx+tlDT/kWJ4LqBj by7mkyPbVM4Ba7ckKRQuGNycxOAjaHUJtz4Zcl453S7ktllx35MWqBchUoyCmGtZ70si rxpUuI/mTJ+P6d0cYuBeeKi98g9os6UGMOqLU/a4OGB+zR429PtI6aEGUpXUz+LTVqFl ZCSaMSab5m9evsiK7nLG71/jaxvTEddzuDGwUpLONC9UkGU4DxkOWhyvTll4L5ILytj4 FbXw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id:dkim-signature; bh=jxF3Dyu1X1xHy/gJs5i8fe1sRO+qtd0N+C6StYXkMBc=; b=wQ1gehOFxndVKvgE+Wn5yL8VkVJLfxZGU7/weTeevMtpoKaNVvTJLBbykc+m2gwx1K pKWmIU/McduzefG7aix84TYUTsNMfz7IB6GGhZWCsIYKfGikJzcJJNxLFfYH5nD6kJ0W 95x18TbRJZ0VJ2U2f/R1qZigN78qgCg69lr7ueWv9XKuB53Yh4YbodwPjugODgRPobEk m08o6E7u6GLTwElSRApN70AQ2IkhTAhSHCdK/2/vPuSi3o0ldP18YmrrFcKEPL2Zuwyv BKMAAIwv1TLLw5dIQjMEt/31hWKJnvGytZKEPEOsYxAmQ+sMCBEhFMfKJMm77PDK9Bu2 75Hw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=RjnOTixh; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id qq21-20020a17090720d500b006e834cdabedsi1992888ejb.291.2022.04.08.18.29.06; Fri, 08 Apr 2022 18:29:30 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=RjnOTixh; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233472AbiDHQnZ (ORCPT + 99 others); Fri, 8 Apr 2022 12:43:25 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55128 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230014AbiDHQnX (ORCPT ); Fri, 8 Apr 2022 12:43:23 -0400 Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B652A10E9; Fri, 8 Apr 2022 09:41:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1649436079; x=1680972079; h=message-id:date:mime-version:subject:to:cc:references: from:in-reply-to:content-transfer-encoding; bh=aAbi/qmKq+r/hL4lQDiEEDWm+BnU3Db3wanrG3/Vskg=; b=RjnOTixhwA9lwoKCBk2+xaerM8hMPcB47cZjmyeOTLzJCbS2JSxn0LBD taT1hPg2zYxrZUo5uddpjvwlZT+fvhzQIEuQeRztUqxT+2kJRsSG6sStz oWUeZzW6Lv95vmEJlLUrBSvCflKl07AKzA5IB6UFhJVrUnhNvNnQUIN9K kVn20bf5SYiMVKl5YHwbtJ7bIk8FWezleaM21bejZDqj6ggzf8gu5S4tQ 5CbsM82QnMtWxc+XtatlzH0elCB+rrx5JxERE8l1YEQ7iYEpcVkHzM7eb SGGaifQVyR2UriXi9Kjo9T2BgH+pODCU0ulORxV/FU1hL/sBpQeqXzH2/ g==; X-IronPort-AV: E=McAfee;i="6400,9594,10310"; a="243768602" X-IronPort-AV: E=Sophos;i="5.90,245,1643702400"; d="scan'208";a="243768602" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Apr 2022 09:41:19 -0700 X-IronPort-AV: E=Sophos;i="5.90,245,1643702400"; d="scan'208";a="571547038" Received: from zengguan-mobl1.ccr.corp.intel.com (HELO [10.254.213.22]) ([10.254.213.22]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Apr 2022 09:41:14 -0700 Message-ID: Date: Sat, 9 Apr 2022 00:41:03 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Thunderbird/91.7.0 Subject: Re: [PATCH v7 8/8] KVM: VMX: enable IPI virtualization Content-Language: en-US To: Sean Christopherson Cc: Paolo Bonzini , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , "kvm@vger.kernel.org" , Dave Hansen , "Luck, Tony" , Kan Liang , Thomas Gleixner , Ingo Molnar , Borislav Petkov , "H. Peter Anvin" , Kim Phillips , Jarkko Sakkinen , Jethro Beekman , "Huang, Kai" , "x86@kernel.org" , "linux-kernel@vger.kernel.org" , "Hu, Robert" , "Gao, Chao" References: <20220304080725.18135-1-guang.zeng@intel.com> <20220304080725.18135-9-guang.zeng@intel.com> <54df6da8-ad68-cc75-48db-d18fc87430e9@intel.com> From: Zeng Guang In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-4.9 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 4/5/2022 1:57 AM, Sean Christopherson wrote: > On Sun, Apr 03, 2022, Zeng Guang wrote: >> On 4/1/2022 10:37 AM, Sean Christopherson wrote: >>>> @@ -4219,14 +4226,21 @@ static void vmx_refresh_apicv_exec_ctrl(struct kvm_vcpu *vcpu) >>>> pin_controls_set(vmx, vmx_pin_based_exec_ctrl(vmx)); >>>> if (cpu_has_secondary_exec_ctrls()) { >>>> - if (kvm_vcpu_apicv_active(vcpu)) >>>> + if (kvm_vcpu_apicv_active(vcpu)) { >>>> secondary_exec_controls_setbit(vmx, >>>> SECONDARY_EXEC_APIC_REGISTER_VIRT | >>>> SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY); >>>> - else >>>> + if (enable_ipiv) >>>> + tertiary_exec_controls_setbit(vmx, >>>> + TERTIARY_EXEC_IPI_VIRT); >>>> + } else { >>>> secondary_exec_controls_clearbit(vmx, >>>> SECONDARY_EXEC_APIC_REGISTER_VIRT | >>>> SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY); >>>> + if (enable_ipiv) >>>> + tertiary_exec_controls_clearbit(vmx, >>>> + TERTIARY_EXEC_IPI_VIRT); >>> Oof. The existing code is kludgy. We should never reach this point without >>> enable_apicv=true, and enable_apicv should be forced off if APICv isn't supported, >>> let alone seconary exec being support. >>> >>> Unless I'm missing something, throw a prep patch earlier in the series to drop >>> the cpu_has_secondary_exec_ctrls() check, that will clean this code up a smidge. >> cpu_has_secondary_exec_ctrls() check can avoid wrong vmcs write in case mistaken >> invocation. > KVM has far bigger problems on buggy invocation, and in that case the resulting > printk + WARN from the failed VMWRITE is a good thing. SDM doesn't define VMWRITE failure for such case. But it says the logical processor operates as if all the secondary processor-based VM-execution controls were 0 if "activate secondary controls" primary processor-based VM-execution control is 0. So we may add WARN() to detect this kind of buggy invocation instead. diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 61e075e16c19..6c370b507b45 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -4200,22 +4200,22 @@ static void vmx_refresh_apicv_exec_ctrl(struct kvm_vcpu *vcpu)         struct vcpu_vmx *vmx = to_vmx(vcpu);         pin_controls_set(vmx, vmx_pin_based_exec_ctrl(vmx)); -       if (cpu_has_secondary_exec_ctrls()) { -               if (kvm_vcpu_apicv_active(vcpu)) { -                       secondary_exec_controls_setbit(vmx, - SECONDARY_EXEC_APIC_REGISTER_VIRT | - SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY); -                       if (enable_ipiv) -                               tertiary_exec_controls_setbit(vmx, - TERTIARY_EXEC_IPI_VIRT); -               } else { -                       secondary_exec_controls_clearbit(vmx, - SECONDARY_EXEC_APIC_REGISTER_VIRT | - SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY); -                       if (enable_ipiv) - tertiary_exec_controls_clearbit(vmx, - TERTIARY_EXEC_IPI_VIRT); -               } + +       WARN(!cpu_has_secondary_exec_ctrls(), +                    "VMX: unexpected vmwrite with inactive secondary exec controls"); + +       if (kvm_vcpu_apicv_active(vcpu)) { +               secondary_exec_controls_setbit(vmx, +                             SECONDARY_EXEC_APIC_REGISTER_VIRT | + SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY); +               if (enable_ipiv) +                       tertiary_exec_controls_setbit(vmx, TERTIARY_EXEC_IPI_VIRT); +       } else { +               secondary_exec_controls_clearbit(vmx, +                             SECONDARY_EXEC_APIC_REGISTER_VIRT | + SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY); +               if (enable_ipiv) +                       tertiary_exec_controls_clearbit(vmx, TERTIARY_EXEC_IPI_VIRT);         }         vmx_update_msr_bitmap_x2apic(vcpu); > >>>> + */ >>>> + if (vmx_can_use_ipiv(vcpu->kvm)) { >>>> + struct kvm_vmx *kvm_vmx = to_kvm_vmx(vcpu->kvm); >>>> + >>>> + mutex_lock(&vcpu->kvm->lock); >>>> + err = vmx_alloc_pid_table(kvm_vmx); >>>> + mutex_unlock(&vcpu->kvm->lock); >>> This belongs in vmx_vm_init(), doing it in vCPU creation is a remnant of the >>> dynamic resize approach that's no longer needed. >> We cannot allocate pid table in vmx_vm_init() as userspace has no chance to >> set max_vcpu_ids at this stage. That's the reason we do it in vCPU creation >> instead. > Ah, right. Hrm. And that's going to be a recurring problem if we try to use the > dynamic kvm->max_vcpu_ids to reduce other kernel allocations. > > Argh, and even kvm_arch_vcpu_precreate() isn't protected by kvm->lock. > > Taking kvm->lock isn't problematic per se, I just hate doing it so deep in a > per-vCPU flow like this. > > A really gross hack/idea would be to make this 64-bit only and steal the upper > 32 bits of @type in kvm_create_vm() for the max ID. > > I think my first choice would be to move kvm_arch_vcpu_precreate() under kvm->lock. > None of the architectures that have a non-nop implemenation (s390, arm64 and x86) > do significant work, so holding kvm->lock shouldn't harm performance. s390 has to > acquire kvm->lock in its implementation, so we could drop that. And looking at > arm64, I believe its logic should also be done under kvm->lock. > > It'll mean adding yet another kvm_x86_ops, but I like that more than burying the > code deep in vCPU creation. > > Paolo, any thoughts on this? Sounds reasonable. I will prepare patch to refactor the kvm_arch_vcpu_precreate() and make pid table allocation done there. Thanks.