Received: by 2002:a05:6a10:9e8c:0:0:0:0 with SMTP id y12csp3504994pxx; Mon, 2 Nov 2020 10:36:21 -0800 (PST) X-Google-Smtp-Source: ABdhPJxlhEAV/MJq9PQQoNVT+DRNIEflFAvYscYcPeUHybmRvmFPjxbzTSaINLpKEr/OieuCANBs X-Received: by 2002:a17:906:bb0d:: with SMTP id jz13mr16648234ejb.154.1604342181084; Mon, 02 Nov 2020 10:36:21 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1604342181; cv=none; d=google.com; s=arc-20160816; b=vTgNOGmhaNcThHYscPwKncPSTSsnNkEFHmVoxl3D82uYTSUjXOmVQc2KIs+GPQx8MS 5HgfQOSSX0sVKh3r7Dlkmqjx595yjTCNiuzYDg07jTy8jN0fwWK0+UUYVS2UURAnSTSa PJ/4Zf63hRKSD/WMvCPNgaa/ntRkuCwS7Uo4Bl7cwaxCGli8Ec7VeiIqfjP0DTHZRUkB suzl23eML0UJCgy33GBbf3CJ0o7o+W3SssdkkxmbC/tj69brsRyir9pC1j04kVLasSP5 HiABkt9JbU4JwIjVjWwRiY6HCQ1mHtTpKlfp3BheU/ZJQChZXNziH+tmuEO6btsWt9mk +x7g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :ironport-sdr:ironport-sdr; bh=Lrmn1a+WxS44acoZmJKYs91omglrMQkENIOn8zNUQ7w=; b=rEt/zkBbWfNmQYBSarz2Ygj/otR2Ide6xvstYUWYB9MNF49C4rcKZNePTJ+NsAE8Eb RXDEfAzmCCaCscS5kMfs1uBM7dQF/og/FUPIr3DHY4d0LJIB04a0GPFdS7rO+3MUpad5 6evgYaQOfFncY04FmXvoi097FVEZSFOQNq6q1VoBlO+Mq4wg0qzgkzsPfSAyEaVJ2MPn 8nsdDIAta7Ex5olKSqRgPFf2yys3XDDAPVRmYdAC79oIR4/t6fX92L6tTcXJIfj6WFrG +CB9e8YetqPVCCrMWtIQc0cq6TStsW02Soy3Vfba8liQ6KrSekg/sZq5FWWVPzr9NLCE +7Lg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id l11si12121221ejx.254.2020.11.02.10.35.58; Mon, 02 Nov 2020 10:36:21 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726482AbgKBSeB (ORCPT + 99 others); Mon, 2 Nov 2020 13:34:01 -0500 Received: from mga17.intel.com ([192.55.52.151]:4765 "EHLO mga17.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725801AbgKBSeB (ORCPT ); Mon, 2 Nov 2020 13:34:01 -0500 IronPort-SDR: bnuwXQ9emSVxR9tvMh23EZ8WQ+yCg3ifX4e/9buE8mrdGqGtDcTjX6nf25ZMYPMsqzoLfySa6r O0Pjk347mfGQ== X-IronPort-AV: E=McAfee;i="6000,8403,9793"; a="148786621" X-IronPort-AV: E=Sophos;i="5.77,445,1596524400"; d="scan'208";a="148786621" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Nov 2020 10:34:01 -0800 IronPort-SDR: pahTddyjF32YwYnd6OFeQd2/itlS/oc6hyqu8YMbqEDKrlmEYpuwwcjgam8jheQPbPFunfza8+ l8aDQFZSPGuA== X-IronPort-AV: E=Sophos;i="5.77,445,1596524400"; d="scan'208";a="470475711" Received: from sjchrist-coffee.jf.intel.com (HELO linux.intel.com) ([10.54.74.160]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Nov 2020 10:34:00 -0800 Date: Mon, 2 Nov 2020 10:33:59 -0800 From: Sean Christopherson To: Andy Lutomirski Cc: Tao Xu , Paolo Bonzini , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Thomas Gleixner , Ingo Molnar , Borislav Petkov , "H. Peter Anvin" , X86 ML , kvm list , LKML , Xiaoyao Li Subject: Re: [PATCH] KVM: VMX: Enable Notify VM exit Message-ID: <20201102183359.GE21563@linux.intel.com> References: <20201102061445.191638-1-tao3.xu@intel.com> <20201102173130.GC21563@linux.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.24 (2015-08-30) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Nov 02, 2020 at 10:01:16AM -0800, Andy Lutomirski wrote: > On Mon, Nov 2, 2020 at 9:31 AM Sean Christopherson > wrote: > > > > On Mon, Nov 02, 2020 at 08:43:30AM -0800, Andy Lutomirski wrote: > > > On Sun, Nov 1, 2020 at 10:14 PM Tao Xu wrote: > > > > 2. Another patch to disable interception of #DB and #AC when notify > > > > VM-Exiting is enabled. > > > > > > Whoa there. > > > > > > A VM control that says "hey, CPU, if you messed up and livelocked for > > > a long time, please break out of the loop" is not a substitute for > > > fixing the livelocks. So I don't think you get do disable > > > interception of #DB and #AC. > > > > I think that can be incorporated into a module param, i.e. let the platform > > owner decide which tool(s) they want to use to mitigate the legacy architecture > > flaws. > > What's the point? Surely the kernel should reliably mitigate the > flaw, and the kernel should decide how to do so. IMO, setting a reasonably low threshold _is_ mitigating such flaws. E.g. it's entirely possible, if not likely, that we can push the threshold below various ENCLS instruction latencies. Now I'm curious as to how exactly the accounting is done under the hood, e.g. I assume retiring uops of a massive instruction is enough to reset the timer, but I haven't actually read the specs in detail. If userspace is truly malicious, it can easily spawn new VMs/processes to carry out its attack, e.g. exiting to userspace on these VM-Exits effectively throttles userspace as much as straight killing the process. > > > > > I also think you should print a loud warning > > > > I'm not so sure on this one, e.g. userspace could just spin up a new instance > > if its malicious guest and spam the kernel log. > > pr_warn_once()? Or ratelimited. My point was that a straight WARN would be less than ideal. > If this triggers, it's a *bug*, right? Kernel or CPU. Sort of? Many (all?) of the known of the scenarios that can trigger this exit are unlikely to ever be fixed in silicon. I'm not saying they shouldn't be fixed, just that practically speaking they are highly unlikely to be fixed anytime soon. The infinite #DB/#AC recursion flaws are inarguably dumb CPU behavior, but there are other scenarious that are less cut and dried, i.e. may not be fixable without non-trivial tradeoffs. > > > and have some intelligent handling when this new exit triggers. > > > > We discussed something similar in the context of the new bus lock VM-Exit. I > > don't know that it makes sense to try and add intelligence into the kernel. > > In many use cases, e.g. clouds, the userspace VMM is trusted (inasmuch as > > userspace can be trusted), while the guest is completely untrusted. Reporting > > the error to userspace and letting the userspace stack take action is likely > > preferable to doing something fancy in the kernel. > > > > > > Tao, this patch should probably be tagged RFC, at least until we can experiment > > with the threshold on real silicon. KVM and kernel behavior may depend on the > > accuracy of detecting actual attacks, e.g. if we can set a threshold that has > > zero false negatives and near-zero false postives, then it probably makes sense > > to be more assertive in how such VM-Exits are reported and logged. > > If you can actually find a threshold that reliably mitigates the bug > and does not allow a guest to cause undesirably large latency in the > host, then fine. 1/10 if a tick is way too long, I think. Yes, this was my internal review feedback as well. Either that got lost along the way or I wasn't clear enough in stating what should be used as a placeholder until we have silicon in hand.