Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp1329227imu; Wed, 16 Jan 2019 17:21:59 -0800 (PST) X-Google-Smtp-Source: ALg8bN7ZkM5kfuFhJb1KsWIxXmtwJFAUceFgJ/0aP9vR/1/M8NhJAmhYSun07lWWi/ChqeeAZcPl X-Received: by 2002:a63:1c61:: with SMTP id c33mr11236005pgm.354.1547688119690; Wed, 16 Jan 2019 17:21:59 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1547688119; cv=none; d=google.com; s=arc-20160816; b=vXdy2bKX6WIXUzf55+nvncisTvDpcTLnC7omJG3YS7fFwYsBPg+rKXoThfBmd6tyyK hoy11ltVOlOxvjif7GVQMhxU0qRnE5R/jkt/0WAIFyRhicpADAueRQv+yxJNr3F703GT uATFxYE+ltbHJ2YAffJCxFXS5x8VcA5mRzex2uWCKvppiDU3v04EfJIjOTIOChBu2Vob GtlFQjkQ50X97FqvbZV6a/PIq3aRQgcT1FwLFBmQmJZLzbYJaxZGQ1LbSEfIS9e60zlH wFiwBFzJPGwC6TVwsd9vFbNeYmhZ+Kh00YT58G55BzxWQWqenc2kGq2C9Cq7lXIURX1e lAZQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=iKAMi3rRPNNkTRCNfBhC6+g28oUvfn5zTqtaArUqeMA=; b=d1EGbTk3FJiIWn3fHiPyzdrJ9+QkECodlMc+1y0EjkkVvkTWioRstWTJd8oU36SKpz CNuyJHgAuAkzvtJQztysTlhx1JgexmfRlL0eEoStuIojIJYdt/VolTAKb2xsVMGCnTMd UFrsMA1odpGMVjJdipcuSrn+KbahGXbh8pCCFtBOK2HFZAQfk++6aiuZGYf0swNjHhBk k3XqI8TVyEuj2qKM1BhanSiRQeGBxjgW01PxD7pQpcr6r8z9esH4pmshCrtu101OJzqs 6CfovLGSXHlp5O9fkB/brV086QwIClf4EDUP8yiw6SxSwjYQf68XlwM1bxskAsafThDp jpEQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=a5y3kYDi; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id e68si73545pfb.101.2019.01.16.17.21.40; Wed, 16 Jan 2019 17:21:59 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=a5y3kYDi; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728104AbfAQAAo (ORCPT + 99 others); Wed, 16 Jan 2019 19:00:44 -0500 Received: from mail.kernel.org ([198.145.29.99]:45126 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725385AbfAQAAo (ORCPT ); Wed, 16 Jan 2019 19:00:44 -0500 Received: from mail-wr1-f48.google.com (mail-wr1-f48.google.com [209.85.221.48]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 7E59E2086D for ; Thu, 17 Jan 2019 00:00:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1547683242; bh=6GXAVV0Hf4c0UmDO8U8Rk7qsoiz9If8sPlY95CfVxuI=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=a5y3kYDiBbD3ApaHGPF66y27vrWh0wlD135Jd1qC0/Jlhyn1zuc+ZGVPgrro37GrX F1FzZuteGlec7JexdlA5RV21Lz6BHGWp7ZMPQWmlIAYN5rVT0qWJxc1PJ9ACkIhsMM pvznhh4sptMkelH3bh1SsZVHwXabqCg0+PcwVBN4= Received: by mail-wr1-f48.google.com with SMTP id v13so9007530wrw.5 for ; Wed, 16 Jan 2019 16:00:42 -0800 (PST) X-Gm-Message-State: AJcUukc9UnFJZCo9y7hqSYcEq3kqZ5vYz7Q7N8HpbdT8L3G6NcToIs/P 5IpSz2IYLPda4EoUoNiSr7VlKGaYgTIYH+UJDBGSaw== X-Received: by 2002:adf:e08c:: with SMTP id c12mr9123044wri.199.1547683240918; Wed, 16 Jan 2019 16:00:40 -0800 (PST) MIME-Version: 1.0 References: <1547673522-226408-1-git-send-email-fenghua.yu@intel.com> <1547673522-226408-4-git-send-email-fenghua.yu@intel.com> In-Reply-To: <1547673522-226408-4-git-send-email-fenghua.yu@intel.com> From: Andy Lutomirski Date: Wed, 16 Jan 2019 16:00:29 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH v2 3/3] x86/umwait: Control umwait maximum time To: Fenghua Yu Cc: Thomas Gleixner , Borislav Petkov , Ingo Molnar , H Peter Anvin , Andrew Cooper , Ashok Raj , Ravi V Shankar , linux-kernel , x86 Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jan 16, 2019 at 1:24 PM Fenghua Yu wrote: > > IA32_UMWAIT_CONTROL[31:2] determines the maximum time in TSC-quanta > that processor can stay in C0.1 or C0.2. > > The maximum time value in IA32_UMWAIT_CONTROL[31-2] is set as zero which > means there is no global time limit for UMWAIT and TPAUSE instructions. > Each process sets its own umwait maximum time as the instructions operand. > > User can specify global umwait maximum time through interface: > /sys/devices/system/cpu/umwait_control/umwait_max_time > The value in the interface is in decimal in TSC-quanta. Bits[1:0] > are cleared when the value is stored. > > Signed-off-by: Fenghua Yu > --- > arch/x86/include/asm/msr-index.h | 2 ++ > arch/x86/power/umwait.c | 42 +++++++++++++++++++++++++++++++- > 2 files changed, 43 insertions(+), 1 deletion(-) > > diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h > index b56bfecae0de..42b9104fc15b 100644 > --- a/arch/x86/include/asm/msr-index.h > +++ b/arch/x86/include/asm/msr-index.h > @@ -62,6 +62,8 @@ > #define MSR_IA32_UMWAIT_CONTROL 0xe1 > #define UMWAIT_CONTROL_C02_BIT 0x0 > #define UMWAIT_CONTROL_C02_MASK 0x00000001 > +#define UMWAIT_CONTROL_MAX_TIME_BIT 0x2 > +#define UMWAIT_CONTROL_MAX_TIME_MASK 0xfffffffc > > #define MSR_PKG_CST_CONFIG_CONTROL 0x000000e2 > #define NHM_C3_AUTO_DEMOTE (1UL << 25) > diff --git a/arch/x86/power/umwait.c b/arch/x86/power/umwait.c > index 95b3867aac1e..4a1a507d3bb7 100644 > --- a/arch/x86/power/umwait.c > +++ b/arch/x86/power/umwait.c > @@ -10,6 +10,7 @@ > #include > > static int umwait_enable_c0_2 = 1; /* 0: disable C0.2. 1: enable C0.2. */ > +static u32 umwait_max_time; /* In TSC-quanta. Only bits [31:2] are used. */ > static DEFINE_MUTEX(umwait_lock); > > /* Return value that will be used to set umwait control MSR */ > @@ -20,7 +21,8 @@ static inline u32 umwait_control_val(void) > * When bit 0 is 1, C0.2 is disabled. Otherwise, C0.2 is enabled. > * So value in bit 0 is opposite of umwait_enable_c0_2. > */ > - return ~umwait_enable_c0_2 & UMWAIT_CONTROL_C02_MASK; > + return (~umwait_enable_c0_2 & UMWAIT_CONTROL_C02_MASK) | > + umwait_max_time; > } > > static ssize_t umwait_enable_c0_2_show(struct device *dev, > @@ -61,8 +63,46 @@ static ssize_t umwait_enable_c0_2_store(struct device *dev, > > static DEVICE_ATTR_RW(umwait_enable_c0_2); > > +static ssize_t umwait_max_time_show(struct device *kobj, > + struct device_attribute *attr, char *buf) > +{ > + return sprintf(buf, "%u\n", umwait_max_time); > +} > + > +static ssize_t umwait_max_time_store(struct device *kobj, > + struct device_attribute *attr, > + const char *buf, size_t count) > +{ > + u32 msr_val, max_time; > + int cpu, ret; > + > + ret = kstrtou32(buf, 10, &max_time); > + if (ret) > + return ret; > + > + mutex_lock(&umwait_lock); > + > + /* Only get max time value from bits [31:2] */ > + max_time &= UMWAIT_CONTROL_MAX_TIME_MASK; > + /* Update the max time value in memory */ > + umwait_max_time = max_time; > + msr_val = umwait_control_val(); > + get_online_cpus(); > + /* All CPUs have same umwait max time */ > + for_each_online_cpu(cpu) > + wrmsr_on_cpu(cpu, MSR_IA32_UMWAIT_CONTROL, msr_val, 0); > + put_online_cpus(); > + > + mutex_unlock(&umwait_lock); > + > + return count; > +} > + > +static DEVICE_ATTR_RW(umwait_max_time); > + > static struct attribute *umwait_attrs[] = { > &dev_attr_umwait_enable_c0_2.attr, > + &dev_attr_umwait_max_time.attr, > NULL > }; You need something to make sure that newly onlined CPUs get the right value in the MSR. You also need to make sure you restore it on resume from suspend. Something like cpu_init() might be the right place. Also, as previously discussed, I think we should set the default to something quite small, maybe 100 microseconds. IMO the goal is to pick a value that is a high enough multiple of the C0.2 entry+exit latency that we get most of the power and SMT resource savings while being small enough that no one things that UMWAIT is more than a glorified, slightly improved, and far more misleading version of REP NOP. Andrew, would having Linux default to a small value do much to mitigate your concerns that UMWAIT is problematic for hypervisors? Also, can someone who understands the hardware clarify just how dangerous UMWAIT is from a perspective of making speculation attacks more dangerous than they already are? I'm wondering what events will wake up a UMONITOR. I bet that CLFLUSH does. I wonder if a faulting write to a read-only page also does. Or a load from a remote node. Or a speculative store that does not subsequently retire. This instruction seems quite delightful as a tool to create a highish-bandwidth covert channel, and it's possibly quite nice to agument Spectre-like attacks. If it ends up being bad enough, we might need to set the default timeout to the absolute minimum value and possibly ask Intel to give us a way to turn it off entirely.