Received: by 2002:a25:8b91:0:0:0:0:0 with SMTP id j17csp4592132ybl; Mon, 3 Feb 2020 22:05:51 -0800 (PST) X-Google-Smtp-Source: APXvYqxBBfHUhBnqJaJXOqDBCptH4zcnEJ00W9JiEV0CdkKThk12BP6FhrJd/2N24dYuHqVYjL2Z X-Received: by 2002:a05:6808:487:: with SMTP id z7mr2394949oid.59.1580796351038; Mon, 03 Feb 2020 22:05:51 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1580796351; cv=none; d=google.com; s=arc-20160816; b=mkYvfUpMtQN38nVUWs4sdgcs+TM8xcQvtbtbGq8rzxIpfHdzvLKqiu5H95fZRx26tA ikBvfuxkhnnpoTyRJ986vUFgtie1T5zTXeUsBTT4GYKx2GEe2YGFAEAp9zUzNU+q/Dwp DkIrIgUOedMmBqeJYoKgnSGX+PdgDC/1QFmdeInhClMZll8lT3t3NjRDk7CnfIH55/4h 5QwLAby+hzJmqS/gF4Qk/n4vLgHZ9G3pPOkp0FMtREWjojoiYMdgXANnE4WLNiuJtI+8 5EhZnVW1ZHIHz4msRcRiM0zsycAMHHtL0NohTBLfcTGQuMJjA8pHIKlq5QSv4Ra7bmv6 nSiw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-transfer-encoding:content-disposition:mime-version :references:message-id:subject:cc:to:from:date; bh=eheB0I5IKQATBsYfXrpgJmFIgkVB7sMqVLwKIrXuROE=; b=T+lINEzZRCC2abgbmFz1H05+OgA8lB1RQqEK1KanaFGVQ2++ecQbN4grCuLZvb0U6e FwRE72+TP/IVRcR8xIvfE0MlLcZV+QFFTkEoxHyf5QDuzE6KFpxzqrNIlLJYNgNbsQD1 KZ4YkXUiUaX3wXhP9TH5jHcFN71ooz66r2lf1M4i7Mt8fZj2LOQCAJpec6uYhhYG2rRP ImvL9PkuprUPX5nAW99/KmbPMGHBkhnQCvYOKxL9RW4LEDgxRuNCYueUNOjHD24h5Mmn yzPSAbnw/10GRSiphWmg8ZpNR1hsHeAXjvXt1ZTyJYCK836HG+hY7Eix/4BrGLjEb3Zf MjVA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id d199si10516590oib.135.2020.02.03.22.05.38; Mon, 03 Feb 2020 22:05:51 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726855AbgBDGDz (ORCPT + 99 others); Tue, 4 Feb 2020 01:03:55 -0500 Received: from mga05.intel.com ([192.55.52.43]:33401 "EHLO mga05.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726053AbgBDGDy (ORCPT ); Tue, 4 Feb 2020 01:03:54 -0500 X-Amp-Result: UNKNOWN X-Amp-Original-Verdict: FILE UNKNOWN X-Amp-File-Uploaded: False Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga105.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 03 Feb 2020 22:03:53 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.70,398,1574150400"; d="scan'208";a="235019745" Received: from sjchrist-coffee.jf.intel.com (HELO linux.intel.com) ([10.54.74.202]) by orsmga006.jf.intel.com with ESMTP; 03 Feb 2020 22:03:53 -0800 Date: Mon, 3 Feb 2020 22:03:53 -0800 From: Sean Christopherson To: Andy Lutomirski Cc: Xiaoyao Li , Thomas Gleixner , Ingo Molnar , Borislav Petkov , "H. Peter Anvin" , Paolo Bonzini , X86 ML , LKML , kvm list Subject: Re: [PATCH 2/2] KVM: VMX: Extend VMX's #AC handding Message-ID: <20200204060353.GB31665@linux.intel.com> References: <0fe84cd6-dac0-2241-59e5-84cb83b7c42b@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Feb 03, 2020 at 10:49:50AM -0800, Andy Lutomirski wrote: > On Sat, Feb 1, 2020 at 8:33 PM Xiaoyao Li wrote: > > > > On 2/2/2020 1:56 AM, Andy Lutomirski wrote: > > > > > > > > > There are two independent problems here. First, SLD *can’t* be > > > virtualized sanely because it’s per-core not per-thread. > > > > Sadly, it's the fact we cannot change. So it's better virtualized only when > > SMT is disabled to make thing simple. > > > > > Second, most users *won’t want* to virtualize it correctly even if they > > > could: if a guest is allowed to do split locks, it can DoS the system. > > > > To avoid DoS attack, it must use sld_fatal mode. In this case, guest are > > forbidden to do split locks. > > > > > So I think there should be an architectural way to tell a guest that SLD > > > is on whether it likes it or not. And the guest, if booted with sld=warn, > > > can print a message saying “haha, actually SLD is fatal” and carry on. Ya, but to make it architectural it needs to be actual hardware behavior. I highly doubt we can get explicit documentation in the SDM regarding the behavior of a hypervisor. E.g. the "official" hypervisor CPUID bit and CPUID range is documented in the SDM as simply being reserved until the end of time. Getting a bit reserved in the SDM does us no good as VMM handling of the bit would still be determined by convention. But, getting something in the SDM would serve our purposes, even if it's too late to get it implemented for the first CPUs that support SLD. It would, in theory, require kernels to be prepared to handle a sticky SLD bit and define a common way for VMMs to virtualize the behavior. A sticky/lock bit in the MSR is probably the easiest to implement in ucode? That'd be easy for KVM to emulate, and then the kernel init code would look something like: static void split_lock_init(void) { u64 test_ctrl_val; if (rdmsrl_safe(MSR_TEST_CTRL, &test_ctrl_val)) { sld_state = sld_off; return; } if (sld_state != sld_fatal && (test_ctrl_val & MSR_TEST_CTRL_LOCK_DETECT) && (test_ctrl_val & MSR_TEST_CTRL_LOCK_DETECT_STICKY)) { pr_crit("haha, actually SLD is fatal\n"); sld_state = std_fatal; return; } ... } > > OK. Let me sort it out. > > > > If SMT is disabled/unsupported, so KVM advertises SLD feature to guest. > > below are all the case: > > > > ----------------------------------------------------------------------- > > Host Guest Guest behavior > > ----------------------------------------------------------------------- > > 1. off same as in bare metal > > ----------------------------------------------------------------------- > > 2. warn off allow guest do split lock (for old guest): > > hardware bit set initially, once split lock > > happens, clear hardware bit when vcpu is running > > So, it's the same as in bare metal > > > > 3. warn 1. user space: get #AC, then clear MSR bit, but > > hardware bit is not cleared, #AC again, finally > > clear hardware bit when vcpu is running. > > So it's somehow the same as in bare-metal > > Well, kind of, except that the warning is inaccurate -- there is no > guarantee that the hardware bit will be set at all when the guest is > running. This doesn't sound *that* bad, but it does mean that the > guest doesn't get the degree of DoS protection it thinks it's getting. > > My inclination is that, the host mode is warn, then SLD should not be > exposed to the guest at all and the host should fully handle it. KVM can expose it to the guest. KVM just needs to ensure SLD is turned on prior to VM-Enter with vcpu->msr_test_ctrl.sld=1, which is easy enough. > > 2. kernel: same as in bare metal. > > > > 4. fatal same as in bare metal > > ---------------------------------------------------------------------- > > 5.fatal off guest is killed when split lock, > > or forward #AC to guest, this way guest gets an > > unexpected #AC > > Killing the guest seems like the right choice. But see below -- this > is not ideal if the guest is new. > > > > > 6. warn 1. user space: get #AC, then clear MSR bit, but > > hardware bit is not cleared, #AC again, > > finally guest is killed, or KVM forwards #AC > > to guest then guest gets an unexpected #AC. > > 2. kernel: same as in bare metal, call die(); > > > > 7. fatal same as in bare metal > > ---------------------------------------------------------------------- > > > > Based on the table above, if we want guest has same behavior as in bare > > metal, we can set host to sld_warn mode. > > I don't think this is correct. If the host is in warn mode, then the > guest behavior will be erratic. I'm not sure it makes sense for KVM > to expose such erratic behavior to the guest. It's doable without introducing non-architectural behavior and without too much pain on KVM's end. https://lkml.kernel.org/r/20200204053552.GA31665@linux.intel.com > > If we want prevent DoS from guest, we should set host to sld_fatal mode. > > > > > > Now, let's analysis what if there is an architectural way to tell a > > guest that SLD is forced on. Assume it's a SLD_forced_on cpuid bit. > > > > - Host is sld_off, SLD_forced_on cpuid bit is not set, no change for > > case #1