Received: by 2002:a05:6a10:a0d1:0:0:0:0 with SMTP id j17csp184428pxa; Tue, 4 Aug 2020 20:57:56 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwDpjXKgXOmy1P2BlMf/6FcFnU6eTU7N8kaw3nA7jxRH28iiiJa0iXdbEqwevRjAHYSgSU4 X-Received: by 2002:a17:906:2e0a:: with SMTP id n10mr1382427eji.18.1596599876001; Tue, 04 Aug 2020 20:57:56 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1596599875; cv=none; d=google.com; s=arc-20160816; b=CHobVGplTMQtnWb9wUaHhQvUEDVyFyb1ElADpBouwAc4l9jrbJcbmCBMlJi5GjCugj IdQLrEFap20iqKPHBbrBOU3g64QUSvQeHPn6AR57E2umOEP9resJz++GEfd6LiZsBk6W WEFx4/QkYj9p8Tp6MHgxhB084sY8rV1ncUiisoO5ba/7FJpXf2shvvNASAGFXipR4xnE zx15Y+p5VquAE8ao48ZS3oNMXzZrxY2I4aXlcRGikBzeAKd2hMIf9pU755xnqTwog8mB R2uP6tosijIo8nP7PrnhWQIT1F2yUtVY2Ye7ei4XhhN6AxgvtVmtwxXsyqn3x11SOUy1 8kTQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:ironport-sdr:ironport-sdr; bh=18ehDj2r3epbXPLiaMw3cbMDBb4gIjhZagaBBWRuorU=; b=VPiP+9cwqm2jhRJbPn2/o+E8GWvRZDFnS+yz1FWk4L9CEr8TZsd0AMvlv845+LuAWm cefKjVLjq0rRszA3jzurWCRsAi3yFMgImjdZ8J1WZOGHPSW6lrgCXxws4+dCGgcgbFsy Rw6LPp78cv12W5H/NjfRQDd5Tw49zuFCtJu4aGd2MgR/ltIY+0O4pruPEtI9K1+Uz62S I/52OWB+q0FH/98pjojT4rJFHkwNWAfXReSe1OVnhtGiBm5S26yzLqmehmIOYbzPv/GL hTz+vFTanyn001H/uJ6nDYNe5AoKWdPytexGqSSN6xxxBELUYr7s7QgX0v0elsJFBycV PUSQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id bd8si452133edb.477.2020.08.04.20.57.32; Tue, 04 Aug 2020 20:57:55 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726056AbgHED51 (ORCPT + 99 others); Tue, 4 Aug 2020 23:57:27 -0400 Received: from mga05.intel.com ([192.55.52.43]:12898 "EHLO mga05.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725904AbgHED51 (ORCPT ); Tue, 4 Aug 2020 23:57:27 -0400 IronPort-SDR: V9MbrmgphVpVeG2Va6o66+cljy24jk42lzi3oTBeELJuL+mZb9DHWUc5jZEfVwvjVtVEWgYSTu ogLLWC32+wJw== X-IronPort-AV: E=McAfee;i="6000,8403,9703"; a="237319410" X-IronPort-AV: E=Sophos;i="5.75,436,1589266800"; d="scan'208";a="237319410" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Aug 2020 20:57:26 -0700 IronPort-SDR: 8SZAryz4BEhy65sUklKm54H7M8MknXTFJPnNuhaz/7EPiFGVOyP/sW/qnX9tvJtIh6e4h7ScCk kdWGTSStlqiQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.75,436,1589266800"; d="scan'208";a="330810908" Received: from cli6-desk1.ccr.corp.intel.com (HELO [10.239.161.135]) ([10.239.161.135]) by FMSMGA003.fm.intel.com with ESMTP; 04 Aug 2020 20:57:20 -0700 Subject: Re: [RFC PATCH 00/16] Core scheduling v6 To: Joel Fernandes Cc: viremana@linux.microsoft.com, Nishanth Aravamudan , Julien Desfossez , Peter Zijlstra , Tim Chen , Ingo Molnar , Thomas Glexiner , Paul Turner , Linus Torvalds , LKML , Subhra Mazumdar , Frederic Weisbecker , Kees Cook , Greg Kerr , Phil Auld , Aaron Lu , Aubrey Li , Valentin Schneider , Mel Gorman , Pawan Gupta , Paolo Bonzini , Joel Fernandes , Vineeth Pillai , Chen Yu , Christian Brauner , "Ning, Hongyu" , =?UTF-8?B?YmVuYmppYW5nKOiSi+W9qik=?= References: <6d0f9fc0-2e34-f559-29bc-4143e6d3f751@linux.intel.com> From: "Li, Aubrey" Message-ID: Date: Wed, 5 Aug 2020 11:57:20 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101 Thunderbird/68.9.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2020/8/4 0:53, Joel Fernandes wrote: > Hi Aubrey, > > On Mon, Aug 3, 2020 at 4:23 AM Li, Aubrey wrote: >> >> On 2020/7/1 5:32, Vineeth Remanan Pillai wrote: >>> Sixth iteration of the Core-Scheduling feature. >>> >>> Core scheduling is a feature that allows only trusted tasks to run >>> concurrently on cpus sharing compute resources (eg: hyperthreads on a >>> core). The goal is to mitigate the core-level side-channel attacks >>> without requiring to disable SMT (which has a significant impact on >>> performance in some situations). Core scheduling (as of v6) mitigates >>> user-space to user-space attacks and user to kernel attack when one of >>> the siblings enters the kernel via interrupts. It is still possible to >>> have a task attack the sibling thread when it enters the kernel via >>> syscalls. >>> >>> By default, the feature doesn't change any of the current scheduler >>> behavior. The user decides which tasks can run simultaneously on the >>> same core (for now by having them in the same tagged cgroup). When a >>> tag is enabled in a cgroup and a task from that cgroup is running on a >>> hardware thread, the scheduler ensures that only idle or trusted tasks >>> run on the other sibling(s). Besides security concerns, this feature >>> can also be beneficial for RT and performance applications where we >>> want to control how tasks make use of SMT dynamically. >>> >>> This iteration is mostly a cleanup of v5 except for a major feature of >>> pausing sibling when a cpu enters kernel via nmi/irq/softirq. Also >>> introducing documentation and includes minor crash fixes. >>> >>> One major cleanup was removing the hotplug support and related code. >>> The hotplug related crashes were not documented and the fixes piled up >>> over time leading to complex code. We were not able to reproduce the >>> crashes in the limited testing done. But if they are reroducable, we >>> don't want to hide them. We should document them and design better >>> fixes if any. >>> >>> In terms of performance, the results in this release are similar to >>> v5. On a x86 system with N hardware threads: >>> - if only N/2 hardware threads are busy, the performance is similar >>> between baseline, corescheduling and nosmt >>> - if N hardware threads are busy with N different corescheduling >>> groups, the impact of corescheduling is similar to nosmt >>> - if N hardware threads are busy and multiple active threads share the >>> same corescheduling cookie, they gain a performance improvement over >>> nosmt. >>> The specific performance impact depends on the workload, but for a >>> really busy database 12-vcpu VM (1 coresched tag) running on a 36 >>> hardware threads NUMA node with 96 mostly idle neighbor VMs (each in >>> their own coresched tag), the performance drops by 54% with >>> corescheduling and drops by 90% with nosmt. >>> >> >> We found uperf(in cgroup) throughput drops by ~50% with corescheduling. >> >> The problem is, uperf triggered a lot of softirq and offloaded softirq >> service to *ksoftirqd* thread. >> >> - default, ksoftirqd thread can run with uperf on the same core, we saw >> 100% CPU utilization. >> - coresched enabled, ksoftirqd's core cookie is different from uperf, so >> they can't run concurrently on the same core, we saw ~15% forced idle. >> >> I guess this kind of performance drop can be replicated by other similar >> (a lot of softirq activities) workloads. >> >> Currently core scheduler picks cookie-match tasks for all SMT siblings, does >> it make sense we add a policy to allow cookie-compatible task running together? >> For example, if a task is trusted(set by admin), it can work with kernel thread. >> The difference from corescheduling disabled is that we still have user to user >> isolation. > > In ChromeOS we are considering all cookie-0 tasks as trusted. > Basically if you don't trust a task, then that is when you assign the > task a tag. We do this for the sandboxed processes. I have a proposal of this, by changing cpu.tag to cpu.coresched_policy, something like the following: +/* + * Core scheduling policy: + * - CORE_SCHED_DISABLED: core scheduling is disabled. + * - CORE_COOKIE_MATCH: tasks with same cookie can run + * on the same core concurrently. + * - CORE_COOKIE_TRUST: trusted task can run with kernel thread on the same core concurrently. + * - CORE_COOKIE_LONELY: tasks with cookie can run only + * with idle thread on the same core. + */ +enum coresched_policy { + CORE_SCHED_DISABLED, + CORE_SCHED_COOKIE_MATCH, + CORE_SCHED_COOKIE_TRUST, + CORE_SCHED_COOKIE_LONELY, +}; We can set policy to CORE_COOKIE_TRUST of uperf cgroup and fix this kind of performance regression. Not sure if this sounds attractive? > > Is the uperf throughput worse with SMT+core-scheduling versus no-SMT ? This is a good question, from the data we measured by uperf, SMT+core-scheduling is 28.2% worse than no-SMT, :( Thanks, -Aubrey > > thanks, > > - Joel > PS: I am planning to write a patch behind a CONFIG option that tags > all processes (default untrusted) so everything gets a cookie which > some folks said was how they wanted (have a whitelist instead of > blacklist). >