Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp1334402imu; Wed, 16 Jan 2019 17:29:08 -0800 (PST) X-Google-Smtp-Source: ALg8bN4LzBdmCMopV1yKrfIN+Tvt5OnmHiDnbtjCoTFwK17wm3/tBvX+ra37nnnQG/K8+mUAJbT+ X-Received: by 2002:a65:4381:: with SMTP id m1mr11390605pgp.358.1547688548875; Wed, 16 Jan 2019 17:29:08 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1547688548; cv=none; d=google.com; s=arc-20160816; b=H6butuCVJ+als5lB9hwfLsVUQAxeQgDO4VhosVn0pW1TKG2rtKuap+tBE2TGxcM8+s u4QXoahQDR7pqoTbT/cXQwO0wu4C2IS5E8MJzD/BfePX1ho2KLH/bJtlTOm5uFK+FkIF kskeOlEzCoeSVCVAkluFwvovdBYcd3123eEzFcU6BO2xio6AiYn8XLKloqpp9MEPJG06 OFOdGm8IvHQsPn6Da3DZm1H2xPNUEZS14wOz7rxKtCLIxlyC62nxZFCwxRr9sxfQmznD lEMyQ0ilz9H1NxrBQfQzAveC4swzwJeQL/DdMn5IFklVkRa3w1w8rirIftyKQl9KqaXB Kx9A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=wXIF8yy1JmOmhSAauChmHO87uY89iP8PLCoCOqZj4HE=; b=pQD5SNCvrshtjee4GTMJ4VhyZgRXY66kiczj4f+/SlLTC4Q3pZbZPtD/4W66m3mQoA HXvxsMR+NTkllaVAfrSA4zCc1h+8+xa64+fOY8QEWvU0iXIo+rQA3EqDFSHG/OLHkccJ 6xEZtE5gzqr1Qco1xmi99Aj46wPJB/SFv6FJB+tgpRo4mrgdnIDidvpYNZCjCD8QPysp /oQB5fEU3cbrZRH1pk2oDWw4Ijj+OFptEXDM/Nle/U09GSW5WLV+Brgi22eO+WnUhTm6 OsIrYvXaG2It3LCannDhdZZpdTXi5RF0P1cSi4AwmwLbU9gwf57NdPoRa9513KlSUyS6 2sTQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=y27PGHqq; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f63si87665pfg.136.2019.01.16.17.28.50; Wed, 16 Jan 2019 17:29:08 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=y27PGHqq; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728795AbfAQAbP (ORCPT + 99 others); Wed, 16 Jan 2019 19:31:15 -0500 Received: from mail.kernel.org ([198.145.29.99]:60686 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727397AbfAQAbO (ORCPT ); Wed, 16 Jan 2019 19:31:14 -0500 Received: from mail-wm1-f53.google.com (mail-wm1-f53.google.com [209.85.128.53]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 1CC0A2077B for ; Thu, 17 Jan 2019 00:31:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1547685073; bh=tav6rfRlDroqmk6NNdiB/uW4L3Gtpz9vYxlAVSrqvOc=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=y27PGHqq2V2r4PPPPIQSXSNmuOZ6CanIaOXP4/Tkg5JEFKiji1YeFXvjSGFMTgMnf K4FiPEv6lBAxvUCsOo1DHI57u4eCucItBOXpbEC9NcfaRl/fn0BoFCiRaZgi+gH5SL /lGb60eWB+n1mGvQgpCDr8kWKJD9q1YAaom+QNcw= Received: by mail-wm1-f53.google.com with SMTP id p6so3946308wmc.1 for ; Wed, 16 Jan 2019 16:31:13 -0800 (PST) X-Gm-Message-State: AJcUukd4B9HrwxLFiCc9byCsQv6pOG6k2J5Lok6GpRdpPmNgVeobPxiZ eW0GwYMiBgN2sFfscdHl4VQwTXxCITH+6HtqXGWd3w== X-Received: by 2002:a1c:f112:: with SMTP id p18mr8976104wmh.83.1547685071506; Wed, 16 Jan 2019 16:31:11 -0800 (PST) MIME-Version: 1.0 References: <1547673522-226408-1-git-send-email-fenghua.yu@intel.com> <1547673522-226408-4-git-send-email-fenghua.yu@intel.com> <20190117000731.GA226938@romley-ivt3.sc.intel.com> In-Reply-To: <20190117000731.GA226938@romley-ivt3.sc.intel.com> From: Andy Lutomirski Date: Wed, 16 Jan 2019 16:30:59 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH v2 3/3] x86/umwait: Control umwait maximum time To: Fenghua Yu Cc: Andy Lutomirski , Thomas Gleixner , Borislav Petkov , Ingo Molnar , H Peter Anvin , Andrew Cooper , Ashok Raj , Ravi V Shankar , linux-kernel , x86 Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jan 16, 2019 at 4:13 PM Fenghua Yu wrote: > > On Wed, Jan 16, 2019 at 04:00:29PM -0800, Andy Lutomirski wrote: > > On Wed, Jan 16, 2019 at 1:24 PM Fenghua Yu wrote: > > > > > > IA32_UMWAIT_CONTROL[31:2] determines the maximum time in TSC-quanta > > > that processor can stay in C0.1 or C0.2. > > > > > > The maximum time value in IA32_UMWAIT_CONTROL[31-2] is set as zero which > > > means there is no global time limit for UMWAIT and TPAUSE instructions. > > > Each process sets its own umwait maximum time as the instructions operand. > > > > > > User can specify global umwait maximum time through interface: > > > /sys/devices/system/cpu/umwait_control/umwait_max_time > > > The value in the interface is in decimal in TSC-quanta. Bits[1:0] > > > are cleared when the value is stored. > > > > > > Signed-off-by: Fenghua Yu > > > --- > > > arch/x86/include/asm/msr-index.h | 2 ++ > > > arch/x86/power/umwait.c | 42 +++++++++++++++++++++++++++++++- > > > 2 files changed, 43 insertions(+), 1 deletion(-) > > > > > > diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h > > > index b56bfecae0de..42b9104fc15b 100644 > > > --- a/arch/x86/include/asm/msr-index.h > > > +++ b/arch/x86/include/asm/msr-index.h > > > @@ -62,6 +62,8 @@ > > > #define MSR_IA32_UMWAIT_CONTROL 0xe1 > > > #define UMWAIT_CONTROL_C02_BIT 0x0 > > > #define UMWAIT_CONTROL_C02_MASK 0x00000001 > > > +#define UMWAIT_CONTROL_MAX_TIME_BIT 0x2 > > > +#define UMWAIT_CONTROL_MAX_TIME_MASK 0xfffffffc > > > > > > #define MSR_PKG_CST_CONFIG_CONTROL 0x000000e2 > > > #define NHM_C3_AUTO_DEMOTE (1UL << 25) > > > diff --git a/arch/x86/power/umwait.c b/arch/x86/power/umwait.c > > > index 95b3867aac1e..4a1a507d3bb7 100644 > > > --- a/arch/x86/power/umwait.c > > > +++ b/arch/x86/power/umwait.c > > > @@ -10,6 +10,7 @@ > > > #include > > > > > > static int umwait_enable_c0_2 = 1; /* 0: disable C0.2. 1: enable C0.2. */ > > > +static u32 umwait_max_time; /* In TSC-quanta. Only bits [31:2] are used. */ > > > static DEFINE_MUTEX(umwait_lock); > > > > > > /* Return value that will be used to set umwait control MSR */ > > > @@ -20,7 +21,8 @@ static inline u32 umwait_control_val(void) > > > * When bit 0 is 1, C0.2 is disabled. Otherwise, C0.2 is enabled. > > > * So value in bit 0 is opposite of umwait_enable_c0_2. > > > */ > > > - return ~umwait_enable_c0_2 & UMWAIT_CONTROL_C02_MASK; > > > + return (~umwait_enable_c0_2 & UMWAIT_CONTROL_C02_MASK) | > > > + umwait_max_time; > > > } > > > > > > static ssize_t umwait_enable_c0_2_show(struct device *dev, > > > @@ -61,8 +63,46 @@ static ssize_t umwait_enable_c0_2_store(struct device *dev, > > > > > > static DEVICE_ATTR_RW(umwait_enable_c0_2); > > > > > > +static ssize_t umwait_max_time_show(struct device *kobj, > > > + struct device_attribute *attr, char *buf) > > > +{ > > > + return sprintf(buf, "%u\n", umwait_max_time); > > > +} > > > + > > > +static ssize_t umwait_max_time_store(struct device *kobj, > > > + struct device_attribute *attr, > > > + const char *buf, size_t count) > > > +{ > > > + u32 msr_val, max_time; > > > + int cpu, ret; > > > + > > > + ret = kstrtou32(buf, 10, &max_time); > > > + if (ret) > > > + return ret; > > > + > > > + mutex_lock(&umwait_lock); > > > + > > > + /* Only get max time value from bits [31:2] */ > > > + max_time &= UMWAIT_CONTROL_MAX_TIME_MASK; > > > + /* Update the max time value in memory */ > > > + umwait_max_time = max_time; > > > + msr_val = umwait_control_val(); > > > + get_online_cpus(); > > > + /* All CPUs have same umwait max time */ > > > + for_each_online_cpu(cpu) > > > + wrmsr_on_cpu(cpu, MSR_IA32_UMWAIT_CONTROL, msr_val, 0); > > > + put_online_cpus(); > > > + > > > + mutex_unlock(&umwait_lock); > > > + > > > + return count; > > > +} > > > + > > > +static DEVICE_ATTR_RW(umwait_max_time); > > > + > > > static struct attribute *umwait_attrs[] = { > > > &dev_attr_umwait_enable_c0_2.attr, > > > + &dev_attr_umwait_max_time.attr, > > > NULL > > > }; > > > > You need something to make sure that newly onlined CPUs get the right > > value in the MSR. > > Onlined CPU takes the umwait_control value in umwait_cpu_online() in > patch 2. Please check if it's correct. > > > You also need to make sure you restore it on resume > > from suspend. Something like cpu_init() might be the right place. > > > > Also, as previously discussed, I think we should set the default to > > something quite small, maybe 100 microseconds. IMO the goal is to > > pick a value that is a high enough multiple of the C0.2 entry+exit > > latency that we get most of the power and SMT resource savings while > > being small enough that no one things that UMWAIT is more than a > > glorified, slightly improved, and far more misleading version of REP > > NOP. > > > > Andrew, would having Linux default to a small value do much to > > mitigate your concerns that UMWAIT is problematic for hypervisors? > > > > Also, can someone who understands the hardware clarify just how > > dangerous UMWAIT is from a perspective of making speculation attacks > > more dangerous than they already are? I'm wondering what events will > > wake up a UMONITOR. I bet that CLFLUSH does. I wonder if a faulting > > write to a read-only page also does. Or a load from a remote node. > > Or a speculative store that does not subsequently retire. This > > instruction seems quite delightful as a tool to create a > > highish-bandwidth covert channel, and it's possibly quite nice to > > agument Spectre-like attacks. If it ends up being bad enough, we > > might need to set the default timeout to the absolute minimum value > > and possibly ask Intel to give us a way to turn it off entirely. > > If CR4.TSD=1 and CPL>0, umwait and tpause generate #GP. So if user thinks > the instructions are dangerous, he can disable TSC. > > Is this the right handling for security concerns? > Setting CR4.TSD=1 systemwide would utterly destroy performance of quite a few workloads. And my argument for setting the value to a lowish but not minimal value is about functionality, not security.