Received: by 2002:a25:8b91:0:0:0:0:0 with SMTP id j17csp1002929ybl; Fri, 31 Jan 2020 12:04:25 -0800 (PST) X-Google-Smtp-Source: APXvYqxDYAVyjsMHnwMH8DUi6gZtBkkanJ1+X0YxukNqre2SqrkjHR1z+kMkCmM1undtEV0hIv94 X-Received: by 2002:a05:6808:350:: with SMTP id j16mr1693297oie.168.1580501065733; Fri, 31 Jan 2020 12:04:25 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1580501065; cv=none; d=google.com; s=arc-20160816; b=yTdGzG/Zcu5t27OXS74xsour6hK7wExfwHKez29qwHaVQjo6yhoyKfy7XLS1DscFeL AOUXPUZV6DGHDty3PiBNkajUKXca6UPa9wsrCQwXxXz4QlLNISuhAGj6oFT9+YAO32Y3 EJ0xubCl5jeo4qIGimcIa5JDjst0C486lwdB6un8P9kQt4cLvaVnxfI04Zevx8fRyfIN ET/scwmtUAMt3los5NC0UpJ4TSUnbdSTjigqYDSzcpkHiXNCDKClH6szbjnOx5cwAS4E oKlyJqa2yco0eLo6bS54g3JToqN0atNu87T2sxgYucrucLKNL2AVUxmrx5TaPRSfoO7P d6kg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=wfx2jf7HfIDmCS0NNMabPUAX4/ofi6pbfNnFTJtY58Y=; b=zqUgpgJ1/S30hxrzK3SF0j9dX2qyQkHz9PGBARjCPZRiyA+LDX5aCotaf0CiyqVXHb ZuAfbXqCdbXltFiP+BRgdQpNLxDxL+qAcggV/0ayNfgBdKFIotKzxNvYnMizWcFpon4H QWxcmPt9s/38tsVEGfWig1+ifC/3Ki7y1Koe5s175BCm9j3xYKY2Lc0lAIc/6lymhEuh pAd+StlXzLtbFM6V6ocBe3/UGO4KUhX6kZLwXYdCYcWxrX0rm0O4Ty+yOmx/i2e00Q7/ 6eSY7pIiLu2bFE1ITC/4EHPlFqUOf+MP8WsJFvCG5lr+Mn8R29zjthKFQ5IU9ifgjGvl 1WSg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id y20si5473438otq.190.2020.01.31.12.04.11; Fri, 31 Jan 2020 12:04:25 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726154AbgAaUBf (ORCPT + 99 others); Fri, 31 Jan 2020 15:01:35 -0500 Received: from mga11.intel.com ([192.55.52.93]:39358 "EHLO mga11.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725954AbgAaUBf (ORCPT ); Fri, 31 Jan 2020 15:01:35 -0500 X-Amp-Result: UNKNOWN X-Amp-Original-Verdict: FILE UNKNOWN X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga102.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 31 Jan 2020 12:01:34 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.70,387,1574150400"; d="scan'208";a="262661544" Received: from sjchrist-coffee.jf.intel.com (HELO linux.intel.com) ([10.54.74.202]) by fmsmga002.fm.intel.com with ESMTP; 31 Jan 2020 12:01:34 -0800 Date: Fri, 31 Jan 2020 12:01:34 -0800 From: Sean Christopherson To: Andy Lutomirski Cc: David Laight , Xiaoyao Li , Thomas Gleixner , Ingo Molnar , Borislav Petkov , "H. Peter Anvin" , Paolo Bonzini , "x86@kernel.org" , "linux-kernel@vger.kernel.org" , "kvm@vger.kernel.org" Subject: Re: [PATCH 1/2] KVM: x86: Emulate split-lock access as a write Message-ID: <20200131200134.GD18946@linux.intel.com> References: <777C5046-B9DE-4F8C-B04F-28A546AE4A3F@amacapital.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <777C5046-B9DE-4F8C-B04F-28A546AE4A3F@amacapital.net> User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jan 30, 2020 at 07:16:24AM -0800, Andy Lutomirski wrote: > > > On Jan 30, 2020, at 4:31 AM, David Laight wrote: > > > >> If split lock detect is enabled (warn/fatal), #AC handler calls die() > >> when split lock happens in kernel. > >> > >> A sane guest should never tigger emulation on a split-lock access, but > >> it cannot prevent malicous guest from doing this. So just emulating the > >> access as a write if it's a split-lock access to avoid malicous guest > >> polluting the kernel log. > > > > That doesn't seem right if, for example, the locked access is addx. > > ISTM it would be better to force an immediate fatal error of some > > kind than just corrupt the guest memory. > > The existing page-spanning case is just as wrong. Yes, it's a deliberate shortcut to handle a corner case that no real world workload will ever trigger[*]. The split-lock #AC case is the same. Actually, it's significantly less likely than the page-split case. With a sane, non-malicious guest, the emulator code in question only gets triggered if unrestricted guest is supported. Without unrestricted guest, there are certain modes, e.g. Big Real Mode, where VM-Enter will fail, in which case KVM needs to emulate the entire guest code stream until the guest transitions back to a valid mode (from VMX perspective). When unrestricted guest is enabled, the emulator is only invoked for MMIO, I/O strings, and for some instructions that are emulated on #UD to allow migrating VMs between hosts without heterogenous CPU capabilities. Unrestricted guest is supported on all Intel CPUs since Westmere, and will be supported on all CPUs that support split-lock #AC and VMX. Except for a few esoteric use cases where using shadow paging is more performant than using EPT, there is zero benefit to disabling unrestricted guest, whereas enabling unrestricted guest provides additional performance and security. In other words, the odds of a sane, non-malicious guest executing a split- lock instruction that needs to be emulated by KVM are basically zilch. The reason the emulator needs to handle this case at all is because a malicious guest could play TLB games to trick KVM into emulating a split- lock instruction, e.g. get the guest's translation for RIP pointing at a string I/O instruction to trigger VM-Exit, but have the host translation point at a completely different instruction. With split-lock #AC enable in the host, that would cause a kernel split-lock #AC and panic the whole system. Exiting to host userspace with "emulation failed" is the other reasonable alternative, but that's basically the same as killing the guest. We're arguing that, in the extremely unlikely event that there is a workload out there that hits this, it's preferable to *maybe* corrupt guest memory and log the anomaly in the kernel log, as opposed to outright killing the guest with a generic "emulation failed". Looking forward, the other reason for taking this shortcut is to easily handle the case where KVM adds support for exposing split-lock #AC to the guest. With this approach, we don't have to teach the emulator how to query for split-lock #AC enabling in the guest. Again, in the interest of not adding code to the emulator that is effectively useless. [*] https://lkml.kernel.org/r/c8b2219b-53d5-38d2-3407-2476b45500eb@redhat.com