Received: by 2002:a25:d7c1:0:0:0:0:0 with SMTP id o184csp1293684ybg; Fri, 18 Oct 2019 15:21:17 -0700 (PDT) X-Google-Smtp-Source: APXvYqyRx4Tu8Go2/ONbyj3n85m63zVvLj2i2okvRFOqz9m7nwvTaOwXp5kOqIk2bAUrV6I/tRxi X-Received: by 2002:a17:906:8d6:: with SMTP id o22mr10816580eje.153.1571437277015; Fri, 18 Oct 2019 15:21:17 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1571437277; cv=none; d=google.com; s=arc-20160816; b=s1VjWZ7rEJE0alH9toyHZaIbQ7+a/tmAMuFmBjZxuig+Dia6B3AYoNJwHYyiLoK2Uw ZRdKgzry+lA1CuSZrLWJ2yiiSfJCtjG53goN5bX+zoBiFbzmhivcLk2HQ7dUh49UvoyD YAkwu+jhpOgQRL056i5KQpzMlzyKLX8w77e1BJM62bD0jissboCUf5ngQfYJrJQvnrpk K1nFP6DUif/MDjhSOX3ve8rFGHEcYuWT1+c3VXDg7peDeSrs4jxPymRwmBN3Q+qX4D8f XpRdvRsBpabGwxQ0KUwsmL3Z5dEwNR8MDlmQsl+RZiJbXxpOOuuCJyx4Gd+lqXKINwt3 XxIg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :message-id:in-reply-to:subject:cc:to:from:date; bh=NWqg3y6+uU7ycgFG/8oIjChKTYKeSkegv0u8LnRZwYY=; b=d4jF2Zy6UrU5j3ZU/inMLkKpZ04QjeQBTQRj8Z5QERmv2AU+XTaHJH9teyOm/aWBTM XLDDYCwkrZPoaFx0p5CjsuwK0SSTWnEmoHdKU91mVDLti5B12FWrYAXuzk/XaH5ZpHcs y2haMZA+Yw9HQlZF78V8bnBrtdYL9r2QCW3+qI/lbK3wtQ1I1HtIPxBYOBQougyLVCxE Xv+Wc06jP+5d4ja072+by2ue3208gTyc3A0YFhKII0LV0LREKBeSy1WlJB3OjsNTiLa2 EmqZ2XPvfzBUQeRSRNdSfWO6ofN79U6rtcPHuMjWLQQuW4cSGtxB7rYDpaviMPDbk8wS PG1w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id l9si4395953ejx.426.2019.10.18.15.20.54; Fri, 18 Oct 2019 15:21:17 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2441540AbfJQVbn (ORCPT + 99 others); Thu, 17 Oct 2019 17:31:43 -0400 Received: from Galois.linutronix.de ([193.142.43.55]:54518 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2441534AbfJQVbm (ORCPT ); Thu, 17 Oct 2019 17:31:42 -0400 Received: from p5b06da22.dip0.t-ipconnect.de ([91.6.218.34] helo=nanos) by Galois.linutronix.de with esmtpsa (TLS1.2:DHE_RSA_AES_256_CBC_SHA256:256) (Exim 4.80) (envelope-from ) id 1iLDMn-0007MJ-68; Thu, 17 Oct 2019 23:31:29 +0200 Date: Thu, 17 Oct 2019 23:31:15 +0200 (CEST) From: Thomas Gleixner To: Sean Christopherson cc: Paolo Bonzini , Xiaoyao Li , Fenghua Yu , Ingo Molnar , Borislav Petkov , H Peter Anvin , Peter Zijlstra , Andrew Morton , Dave Hansen , Radim Krcmar , Ashok Raj , Tony Luck , Dan Williams , Sai Praneeth Prakhya , Ravi V Shankar , linux-kernel , x86 , kvm@vger.kernel.org Subject: Re: [RFD] x86/split_lock: Request to Intel In-Reply-To: <20191017172312.GC20903@linux.intel.com> Message-ID: References: <20190925180931.GG31852@linux.intel.com> <3ec328dc-2763-9da5-28d6-e28970262c58@redhat.com> <57f40083-9063-5d41-f06d-fa1ae4c78ec6@redhat.com> <8808c9ac-0906-5eec-a31f-27cbec778f9c@intel.com> <20191017172312.GC20903@linux.intel.com> User-Agent: Alpine 2.21 (DEB 202 2017-01-01) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Linutronix-Spam-Score: -1.0 X-Linutronix-Spam-Level: - X-Linutronix-Spam-Status: No , -1.0 points, 5.0 required, ALL_TRUSTED=-1,SHORTCIRCUIT=-0.0001 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 17 Oct 2019, Sean Christopherson wrote: > On Thu, Oct 17, 2019 at 02:29:45PM +0200, Thomas Gleixner wrote: > > The more I look at this trainwreck, the less interested I am in merging any > > of this at all. > > > > The fact that it took Intel more than a year to figure out that the MSR is > > per core and not per thread is yet another proof that this industry just > > works by pure chance. > > > > There is a simple way out of this misery: > > > > Intel issues a microcode update which does: > > > > 1) Convert the OR logic of the AC enable bit in the TEST_CTRL MSR to > > AND logic, i.e. when one thread disables AC it's automatically > > disabled on the core. > > > > Alternatively it supresses the #AC when the current thread has it > > disabled. > > > > 2) Provide a separate bit which indicates that the AC enable logic is > > actually AND based or that #AC is supressed when the current thread > > has it disabled. > > > > Which way I don't really care as long as it makes sense. > > The #AC bit doesn't use OR-logic, it's straight up shared, i.e. writes on > one CPU are immediately visible on its sibling CPU. That's less horrible than I read out of your initial explanation. Thankfully all of this is meticulously documented in the SDM ... Though it changes the picture radically. The truly shared MSR allows regular software synchronization without IPIs and without an insane amount of corner case handling. So as you pointed out we need a per core state, which is influenced by: 1) The global enablement switch 2) Host induced #AC 3) Guest induced #AC A) Guest has #AC handling B) Guest has no #AC handling #1: - OFF: #AC is globally disabled - ON: #AC is globally enabled - FORCE: same as ON but #AC is enforced on guests #2: If the host triggers an #AC then the #AC has to be force disabled on the affected core independent of the state of #1. Nothing we can do about that and once the initial wave of #AC issues is fixed this should not happen on production systems. That disables #3 even for the #3.A case for simplicity sake. #3: A) Guest has #AC handling #AC is forwarded to the guest. No further action required aside of accounting B) Guest has no #AC handling If #AC triggers the resulting action depends on the state of #1: - FORCE: Guest is killed with SIGBUS or whatever the virt crowd thinks is the appropriate solution - ON: #AC triggered state is recorded per vCPU and the MSR is toggled on VMENTER/VMEXIT in software from that point on. So the only interesting case is #3.B and #1.state == ON. There you need serialization of the state and the MSR write between the cores, but only when the vCPU triggered an #AC. Until then, nothing to do. vmenter() { if (vcpu->ac_disable) this_core_disable_ac(); } vmexit() { if (vcpu->ac_disable) { this_core_enable_ac(); } this_core_dis/enable_ac() takes the global state into account and has the necessary serialization in place. Thanks, tglx