Received: by 2002:a05:6a10:6d25:0:0:0:0 with SMTP id gq37csp1449168pxb; Sun, 12 Sep 2021 20:00:04 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxgPS9L1op60BlWCrmIJyqMTjq82HhvrZJnLdEpvPvOOCQd9Ed4vZWVxNeaMk99d39M0t3E X-Received: by 2002:a02:1942:: with SMTP id b63mr7798708jab.49.1631502004675; Sun, 12 Sep 2021 20:00:04 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1631502004; cv=none; d=google.com; s=arc-20160816; b=VQ0tHkX0us/cuWiDbrSa8k41n30ev7l+TQH6QpbJlrCFs1mQp/rPokakkEWLsZJH9e o/Yv7DPfAkapdgLbAH19Dps13HzDRK2aLp925sR+R1XpoPlsCPYlmFEBeX7aIErl0xtB R/CFimrajnA4f3J/vw0Xw+peuHTBfwnTDLxJS7gLjkGXMwtUPrwpRagSzB0MBJxFV/8r 0R+pvy5oPQ+/lBLoDy3fpmpMS1MXetG4gah04EU+8II6bZ6lPrlZpvezk9SGrTw3FLNn Hf2WojeuRZOvXKGSGIqX9wDB5qgxd/SfV0ReS1RrTC0Pj20IKYAGqIUjXNG0TvLc5c7x +C9Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:from:references :cc:to:subject; bh=4NtSYXPmptaTUKlhJk8lDAYHixBpcOJJiSV0cp/y5iI=; b=saGRrbm7Ifsl3y7ZbcSq6AURG2jHajTYzuR+kvm66UjP6DIG16hyvzeTnZKhe5ub59 kGnXAiD5O6Y3/Kyil/7DAMAF2IgMUaukw9rEURq3rfJIOk07E0NaeiA3aC4mg58Y5OTS 4XxRQdRxFL4mknuTye26kgwWTMQZynAslFqAEQgzjEPzjjIO7El/a8iYhkB1rEfGGUhj Xw3+kSy+WUSFt8XsJ8ILpGCN9Dqw3iy73R08DSKLpEoqTqsuxgdRCwVuyVnL1UXB/SqT f6KCV7lRKRyHzQasYaNFw1i35YkOXSlxRQhp4nrO3AyKJDOKuDYBE6LWpyWJdab6tHJu seUA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id n7si5773234ilj.150.2021.09.12.19.59.52; Sun, 12 Sep 2021 20:00:04 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236993AbhIMC7l (ORCPT + 99 others); Sun, 12 Sep 2021 22:59:41 -0400 Received: from mga07.intel.com ([134.134.136.100]:4880 "EHLO mga07.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236935AbhIMC7l (ORCPT ); Sun, 12 Sep 2021 22:59:41 -0400 X-IronPort-AV: E=McAfee;i="6200,9189,10105"; a="285251508" X-IronPort-AV: E=Sophos;i="5.85,288,1624345200"; d="scan'208";a="285251508" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Sep 2021 19:58:25 -0700 X-IronPort-AV: E=Sophos;i="5.85,288,1624345200"; d="scan'208";a="542861653" Received: from xiaoyaol-mobl.ccr.corp.intel.com (HELO [10.238.1.222]) ([10.238.1.222]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Sep 2021 19:58:22 -0700 Subject: Re: [PATCH v2] KVM: VMX: Enable Notify VM exit To: Sean Christopherson Cc: Chenyi Qiang , pbonzini@redhat.com, vkuznets@redhat.com, wanpengli@tencent.com, jmattson@google.com, joro@8bytes.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, hpa@zytor.com, x86@kernel.org, kvm@vger.kernel.org, linux-kernel@vger.kernel.org References: <20210525051204.1480610-1-tao3.xu@intel.com> <080602dc-f998-ec13-ddf9-42902aa477de@intel.com> <118cd1b9-1b50-3173-05b8-4293412ca78c@intel.com> From: Xiaoyao Li Message-ID: Date: Mon, 13 Sep 2021 10:58:20 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Firefox/78.0 Thunderbird/78.14.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 9/10/2021 2:59 AM, Sean Christopherson wrote: > On Tue, Sep 07, 2021, Xiaoyao Li wrote: >> On 9/3/2021 12:36 AM, Sean Christopherson wrote: >>> On Thu, Sep 02, 2021, Sean Christopherson wrote: >>>> On Tue, Aug 03, 2021, Xiaoyao Li wrote: >>>>> On 8/2/2021 11:46 PM, Sean Christopherson wrote: >>>>>>>>> @@ -5642,6 +5653,31 @@ static int handle_bus_lock_vmexit(struct kvm_vcpu *vcpu) >>>>>>>>> return 0; >>>>>>>>> } >>>>>>>>> +static int handle_notify(struct kvm_vcpu *vcpu) >>>>>>>>> +{ >>>>>>>>> + unsigned long exit_qual = vmx_get_exit_qual(vcpu); >>>>>>>>> + >>>>>>>>> + if (!(exit_qual & NOTIFY_VM_CONTEXT_INVALID)) { >>>>>>>> >>>>>>>> What does CONTEXT_INVALID mean? The ISE doesn't provide any information whatsoever. >>>>>>> >>>>>>> It means whether the VM context is corrupted and not valid in the VMCS. >>>>>> >>>>>> Well that's a bit terrifying. Under what conditions can the VM context become >>>>>> corrupted? E.g. if the context can be corrupted by an inopportune NOTIFY exit, >>>>>> then KVM needs to be ultra conservative as a false positive could be fatal to a >>>>>> guest. >>>>>> >>>>> >>>>> Short answer is no case will set the VM_CONTEXT_INVALID bit. >>>> >>>> But something must set it, otherwise it wouldn't exist. >> >> For existing Intel silicon, no case will set it. Maybe in the future new >> case will set it. >> >>> The condition(s) under >>>> which it can be set matters because it affects how KVM should respond. E.g. if >>>> the guest can trigger VM_CONTEXT_INVALID at will, then we should probably treat >>>> it as a shutdown and reset the VMCS. >>> >>> Oh, and "shutdown" would be relative to the VMCS, i.e. if L2 triggers a NOTIFY >>> exit with VM_CONTEXT_INVALID then KVM shouldn't kill the entire VM. The least >>> awful option would probably be to synthesize a shutdown VM-Exit to L1. That >>> won't communicate to L1 that vmcs12 state is stale/bogus, but I don't see any way >>> to handle that via an existing VM-Exit reason :-/ >>> >>>> But if VM_CONTEXT_INVALID can occur if and only if there's a hardware/ucode >>>> issue, then we can do: >>>> >>>> if (KVM_BUG_ON(exit_qual & NOTIFY_VM_CONTEXT_INVALID, vcpu->kvm)) >>>> return -EIO; >>>> >>>> Either way, to enable this by default we need some form of documentation that >>>> describes what conditions lead to VM_CONTEXT_INVALID. >> >> I still don't know why the conditions lead to it matters. I think the >> consensus is that once VM_CONTEXT_INVALID happens, the vcpu can no longer >> run. > > Yes, and no longer being able to run the vCPU is precisely the problem. The > condition(s) matters because if there's a possibility, however small, that enabling > NOTIFY_WINDOW can kill a well-behaved guest then it absolutely cannot be enabled by > default. For now, no condition will set it. For future, I believe it will be set only for some fatal case. However, we cannot guarantee no silicon bug to break a well-behaved the guest. Maybe let's make it opt-in? >> Either KVM_BUG_ON() or a specific EXIT to userspace should be OK? > > Not if the VM_CONTEXT_INVALID happens while L2 is running. If software can trigger > VM_CONTEXT_INVALID at will, then killing the VM would open up the door to a > malicious L2 killing L1 (which would be rather ironic since this is an anti-DoS > feature). IIUC, VM_CONTEXT_INVALID only means the current VMCS is garbage, thus > an occurence while L2 is active means that vmcs02 is junk, but L1's state in vmcs01, > vmcs12, etc... is still valid. > Maybe we can kill the L2 when VM_CONTEXT_INVALID happens in L2.