Received: by 2002:a5b:505:0:0:0:0:0 with SMTP id o5csp7164854ybp; Wed, 16 Oct 2019 04:53:48 -0700 (PDT) X-Google-Smtp-Source: APXvYqw0xOJIw/C1fMjnI76k4pPUt/h3KVGncOVuPi3Xoo91jI2GgFDllJspUhjpsIrYqgxNzAWp X-Received: by 2002:a17:906:f42:: with SMTP id h2mr39217487ejj.39.1571226828269; Wed, 16 Oct 2019 04:53:48 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1571226828; cv=none; d=google.com; s=arc-20160816; b=sSJMhp2/pyYr5uiKTcFSK0UuCfQmDFxgFrs60x3qFM9rvb+OISq8+EkXQmj1TkLbbg jOnN7q6GeOQxk4eHynThUnZHm4kh/uVWHgfT6aVVlIkKC01pGMfEPZZ8AA+6encmMC6N kjBUPzKjN82Y1dlMy22nKbB2BqDpawwGIFbtDGtjQtSuMNklecEMoOaM3Lo0ZEIwRNlM OYgIeyIeKGtIsJTw6asVP6xepAcoGIy2Tu5a/YajcUkk1r0mo483pQPne/gmVQP382/e bvg3yQJO22FQZ5RvpTIj+wXGMVDLT1oxqTko0vlaIteccqPOsuVxOWMuNf7SLJ9h4B6o GAJg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject; bh=Q9RQswFrpbzD79vLdhUsjf/jFxBc2NuHuvcHjNoZOv4=; b=boHIkuly24A3TlZWDbk7lsZtb3ARVxfbU72tH+V+MAdBncjg50nQlaaDw1YCrZWhbO 184wJzhdiKxTFrkU6yk9f6RRFmUL/1YLDgGzwmUJRooZNllfOrZvyeCj722lNSWeMH5p iSYAgAoaQ8V9pP/3H2YFkfSsq355mbBfplILlwg7fUa/QnWslgOxzUsY6r+NEFToeoiJ WQ0195y61Zxb1pT7NqVtBiHaZh5MVlDAd/3w+5AFtUnq31c+9zSwa7oPf8T7fXw+j9VV H4An5u7hkFlvIu7CxDFkD8wCea1pOvQNms99xc++uavdPks4QJQrbHGBPFVc6rHFLTwZ x1Vw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id d15si16739548edb.7.2019.10.16.04.53.25; Wed, 16 Oct 2019 04:53:48 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2389026AbfJPG6L (ORCPT + 99 others); Wed, 16 Oct 2019 02:58:11 -0400 Received: from mga01.intel.com ([192.55.52.88]:64506 "EHLO mga01.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2387860AbfJPG6L (ORCPT ); Wed, 16 Oct 2019 02:58:11 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga101.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 15 Oct 2019 23:58:11 -0700 X-IronPort-AV: E=Sophos;i="5.67,302,1566889200"; d="scan'208";a="186064631" Received: from xiaoyaol-mobl.ccr.corp.intel.com (HELO [10.239.13.123]) ([10.239.13.123]) by orsmga007-auth.jf.intel.com with ESMTP/TLS/AES256-SHA; 15 Oct 2019 23:58:07 -0700 Subject: Re: [PATCH v9 09/17] x86/split_lock: Handle #AC exception for split lock To: Sean Christopherson , Thomas Gleixner Cc: Fenghua Yu , Ingo Molnar , Borislav Petkov , H Peter Anvin , Peter Zijlstra , Andrew Morton , Dave Hansen , Paolo Bonzini , Radim Krcmar , Ashok Raj , Tony Luck , Dan Williams , Sai Praneeth Prakhya , Ravi V Shankar , linux-kernel , x86 , kvm@vger.kernel.org References: <1560897679-228028-1-git-send-email-fenghua.yu@intel.com> <1560897679-228028-10-git-send-email-fenghua.yu@intel.com> <20190626203637.GC245468@romley-ivt3.sc.intel.com> <20190925180931.GG31852@linux.intel.com> From: Xiaoyao Li Message-ID: Date: Wed, 16 Oct 2019 14:58:05 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.9.0 MIME-Version: 1.0 In-Reply-To: <20190925180931.GG31852@linux.intel.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 9/26/2019 2:09 AM, Sean Christopherson wrote: > On Wed, Jun 26, 2019 at 11:47:40PM +0200, Thomas Gleixner wrote: >> So only one of the CPUs will win the cmpxchg race, set te variable to 1 and >> warn, the other and any subsequent AC on any other CPU will not warn >> either. So you don't need WARN_ONCE() at all. It's redundant and confusing >> along with the atomic_set(). >> >> Whithout reading that link [1], what Ingo proposed was surely not the >> trainwreck which you decided to put into that debugfs thing. > > We're trying to sort out the trainwreck, but there's an additional wrinkle > that I'd like your input on. > > We overlooked the fact that MSR_TEST_CTRL is per-core, i.e. shared by > sibling hyperthreads. This is especially problematic for KVM, as loading > MSR_TEST_CTRL during VM-Enter could cause spurious #AC faults in the kernel > and bounce MSR_TEST_CTRL.split_lock. > > E.g. if CPU0 and CPU1 are siblings and CPU1 is running a KVM guest with > MSR_TEST_CTRL.split_lock=1, hitting an #AC on CPU0 in the host kernel will > lead to suprious #AC faults and constant toggling of of the MSR. > > CPU0 CPU1 > > split_lock=enabled > > #AC -> disabled > > VM-Enter -> enabled > > #AC -> disabled > > VM-Enter -> enabled > > #AC -> disabled > > > > My thought to handle this: > > - Remove the per-cpu cache. > > - Rework the atomic variable to differentiate between "disabled globally" > and "disabled by kernel (on some CPUs)". > > - Modify the #AC handler to test/set the same atomic variable as the > sysfs knob. This is the "disabled by kernel" flow. > > - Modify the debugfs/sysfs knob to only allow disabling split-lock > detection. This is the "disabled globally" path, i.e. sends IPIs to > clear MSR_TEST_CTRL.split_lock on all online CPUs. > > - Modify the resume/init flow to clear MSR_TEST_CTRL.split_lock if it's > been disabled on *any* CPU via #AC or via the knob. > > - Modify the debugs/sysfs read function to either print the raw atomic > variable, or differentiate between "enabled", "disabled globally" and > "disabled by kernel". > > - Remove KVM loading of MSR_TEST_CTRL, i.e. KVM *never* writes the CPU's > actual MSR_TEST_CTRL. KVM still emulates MSR_TEST_CTRL so that the > guest can do WRMSR and handle its own #AC faults, but KVM doesn't > change the value in hardware. > > * Allowing guest to enable split-lock detection can induce #AC on > the host after it has been explicitly turned off, e.g. the sibling > hyperthread hits an #AC in the host kernel, or worse, causes a > different process in the host to SIGBUS. > > * Allowing guest to disable split-lock detection opens up the host > to DoS attacks. > > - KVM advertises split-lock detection to guest/userspace if and only if > split_lock_detect_disabled is zero. > > - Add a pr_warn_once() in KVM that triggers if split locks are disabled > after support has been advertised to a guest. > > Does this sound sane? > > The question at the forefront of my mind is: why not have the #AC handler > send a fire-and-forget IPI to online CPUs to disable split-lock detection > on all CPUs? Would the IPI be problematic? Globally disabling split-lock > on any #AC would (marginally) simplify the code and would eliminate the > oddity of userspace process (and KVM guest) #AC behavior varying based on > the physical CPU it's running on. > > > Something like: > > #define SPLIT_LOCK_DISABLED_IN_KERNEL BIT(0) > #define SPLIT_LOCK_DISABLED_GLOBALLY BIT(1) > > static atomic_t split_lock_detect_disabled = ATOMIT_INIT(0); > > void split_lock_detect_ac(void) > { > lockdep_assert_irqs_disabled(); > > /* Disable split lock detection on this CPU to avoid reentrant #AC. */ > wrmsrl(MSR_TEST_CTRL, > rdmsrl(MSR_TEST_CTRL) & ~MSR_TEST_CTRL_SPLIT_LOCK_DETECT); > > /* > * If split-lock detection has not been disabled, either by the kernel > * or globally, record that it has been disabled by the kernel and > * WARN. Guarding WARN with the atomic ensures only the first #AC due > * to split-lock is logged, e.g. if multiple CPUs encounter #AC or if > * #AC is retriggered by a perf context NMI that interrupts the > * original WARN. > */ > if (atomic_cmpxchg(&split_lock_detect_disabled, 0, > SPLIT_LOCK_DISABLED_IN_KERNEL) == 0) > WARN(1, "split lock operation detected\n"); > } > > static ssize_t split_lock_detect_wr(struct file *f, const char __user *user_buf, > size_t count, loff_t *ppos) > { > int old; > > > > old = atomic_fetch_or(SPLIT_LOCK_DISABLED_GLOBALLY, > &split_lock_detect_disabled); > > /* Update MSR_TEST_CTRL unless split-lock was already disabled. */ > if (!(old & SPLIT_LOCK_DISABLED_GLOBALLY)) > on_each_cpu(split_lock_update, NULL, 1); > > return count; > } > Hi Thomas, Could you please have a look at Sean's proposal and give your opinion.