Received: by 2002:ac0:946b:0:0:0:0:0 with SMTP id j40csp3656956imj; Tue, 19 Feb 2019 07:16:41 -0800 (PST) X-Google-Smtp-Source: AHgI3IbWHh6G9adlPzlH9QJrlsmc0OJLv00h2qJ3TgDBeJ5JE/kTZ8dvXGm58+NXH6zNPKym1/4t X-Received: by 2002:a63:c307:: with SMTP id c7mr2542330pgd.386.1550589401486; Tue, 19 Feb 2019 07:16:41 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1550589401; cv=none; d=google.com; s=arc-20160816; b=p1yrJaFFCwvkuGA+EvxwisGMQksTfo58ZyzWEAKm2rZ8ARzYbCHbaeFOYezaEyIxLa aikEvRmYlEW+JBWF9nz2HXrqSP+NVtjJ3ST0rruqdYYxH6lMNFn0rz5H+s6HGCJ8m3bl KE1Dlx00QgDG8Wxv3weFPgRzJcGzM1bvgMrC1nxLWCT5KkbKy/aeVfDx3JGg2PtxJ96u NHydG/6tSu8X5+yAISKsaeUnqqVMoWBbCSHHads2tZOemnGLQBp7C/bXrkLI/NfgOyjd W9hfkRD5TVmrO9FbnUBrVDmMZY0RnjynplLzfTwa6mDkpT0bPKn82iOp8JeZcliSL2eX TLTw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=qFvOZomIzscHI8SzO5TYj9sCxqNsD4z/vDs+D6ZY0HE=; b=dHYEALBQPXeSoolSWEJAPKA9mDiv9h57nZFFCI5aqit/HwlyIc4Uf5LQ162CPauro0 uHPWO6VmHSW8fgDBgnBZrDvwdNJA1DNjHK0+wvf7cRdAiuIN7dNdqjRP5n2s+SY3iu+4 q1b8SgBJsVQ5DK2IRqy3XPzbs0cQRmvjGxGJNAvFNCEaolYuE66hQqBgKdgDKD6O5sQX CY8/mZLKYZhYyCZZdLIO4zl36VQ2Gf+0eyzwtjOAEIbZZ7u8rarxqhYAZRoHUaneFFhy 0SMxeW/GB3EEa3jtDl7nESfe6HsN/kkBIS9bYhh/rSbMlgfMizMLqdjK3FJ7uFSo0z1c 0nUQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@gmail.com header.s=20161025 header.b=PvBTocvv; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id d25si778228pfe.122.2019.02.19.07.16.25; Tue, 19 Feb 2019 07:16:41 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail header.i=@gmail.com header.s=20161025 header.b=PvBTocvv; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728704AbfBSPPk (ORCPT + 99 others); Tue, 19 Feb 2019 10:15:40 -0500 Received: from mail-wr1-f65.google.com ([209.85.221.65]:33340 "EHLO mail-wr1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726246AbfBSPPi (ORCPT ); Tue, 19 Feb 2019 10:15:38 -0500 Received: by mail-wr1-f65.google.com with SMTP id i12so22464323wrw.0 for ; Tue, 19 Feb 2019 07:15:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=qFvOZomIzscHI8SzO5TYj9sCxqNsD4z/vDs+D6ZY0HE=; b=PvBTocvvxmq7eIx2yqw2xBmSy5ixRtixtgAnx3znKSWtWkSRE3oF8sOA+3XKxhUXkc Kj2Mp6vfBvTSbvcR2H80V4FBzuLA81MAOa9t3CGouSztHSGrm2iI70Eb2Eg7p7JolHIb Qy4TCAGkAEHB5fmvXwz4J5ZpuNVpNziHLu8zxj9mtNJotR53M2oJzoF8rh8SPMAzNSvr 98Qic2th1CZxzAoYq8ulOGUcbKQ1RVSB7n4FcOM+o8q9QqxPkPPbDZND9anJTJ3FeInO el2VeoDft9dIKv1EdJ2rL0QbU7bhw3MYX1lJf9mVwEwdxO5WvttRzr1b+ssWrRb4kf9m /LBw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:from:to:cc:subject:message-id :references:mime-version:content-disposition:in-reply-to:user-agent; bh=qFvOZomIzscHI8SzO5TYj9sCxqNsD4z/vDs+D6ZY0HE=; b=EOSDLId+3I+5R+EmFbZy1i0TbUnBGLxInP6v+/g0QUS9FLNJmw3bwwvSraDn41uIuM 9o0+FQusM08K8N1B1+rHhVpvT84YEqVi408CfqHuFnbuvaxCbSy6r+Gs8mLfMARimRWe tpHbOrj63FhmRuKKgJoAOIdJr2EtOT8Pmj6WGWUEmQypCoiwXp0ch/ijXfVq9P7N9Gy4 Ab0bY7y7GAiD+1bOiq/79jiSWszTMPHAhCti+aEi7do745V96a23Ibj4kPaBmH6WmXX/ yj84YhgP5AaMezkZldIXNxppuhuQludAnKKUBtR1Jzp4FiYWptI0Aci44Oa3mENNl5qf AcPw== X-Gm-Message-State: AHQUAuamLdKi1WR6lMzEarjoOnCUmyBBJMXphdWk30H0F4KIZtOemqwN s/MEwi9NI+R23LaLnDpMpAc= X-Received: by 2002:adf:fc12:: with SMTP id i18mr19812687wrr.201.1550589336018; Tue, 19 Feb 2019 07:15:36 -0800 (PST) Received: from gmail.com (2E8B0CD5.catv.pool.telekom.hu. [46.139.12.213]) by smtp.gmail.com with ESMTPSA id t2sm1513257wmi.37.2019.02.19.07.15.34 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Tue, 19 Feb 2019 07:15:35 -0800 (PST) Date: Tue, 19 Feb 2019 16:15:32 +0100 From: Ingo Molnar To: Linus Torvalds Cc: Peter Zijlstra , Thomas Gleixner , Paul Turner , Tim Chen , Linux List Kernel Mailing , subhra.mazumdar@oracle.com, =?iso-8859-1?Q?Fr=E9d=E9ric?= Weisbecker , Kees Cook , kerrnel@google.com Subject: Re: [RFC][PATCH 00/16] sched: Core scheduling Message-ID: <20190219151532.GA40581@gmail.com> References: <20190218165620.383905466@infradead.org> <20190218204020.GV32494@hirez.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.9.4 (2018-02-28) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org * Linus Torvalds wrote: > On Mon, Feb 18, 2019 at 12:40 PM Peter Zijlstra wrote: > > > > If there were close to no VMEXITs, it beat smt=off, if there were lots > > of VMEXITs it was far far worse. Supposedly hosting people try their > > very bestest to have no VMEXITs so it mostly works for them (with the > > obvious exception of single VCPU guests). > > > > It's just that people have been bugging me for this crap; and I figure > > I'd post it now that it's not exploding anymore and let others have at. > > The patches didn't look disgusting to me, but I admittedly just > scanned through them quickly. > > Are there downsides (maintenance and/or performance) when core > scheduling _isn't_ enabled? I guess if it's not a maintenance or > performance nightmare when off, it's ok to just give people the > option. So this bit is the main straight-line performance impact when the CONFIG_SCHED_CORE Kconfig feature is present (which I expect distros to enable broadly): +static inline bool sched_core_enabled(struct rq *rq) +{ + return static_branch_unlikely(&__sched_core_enabled) && rq->core_enabled; +} static inline raw_spinlock_t *rq_lockp(struct rq *rq) { + if (sched_core_enabled(rq)) + return &rq->core->__lock + return &rq->__lock; This should at least in principe keep the runtime overhead down to more NOPs and a bit bigger instruction cache footprint - modulo compiler shenanigans. Here's the code generation impact on x86-64 defconfig: text data bss dec hex filename 228 48 0 276 114 sched.core.n/cpufreq.o (ex sched.core.n/built-in.a) 228 48 0 276 114 sched.core.y/cpufreq.o (ex sched.core.y/built-in.a) 4438 96 0 4534 11b6 sched.core.n/completion.o (ex sched.core.n/built-in.a) 4438 96 0 4534 11b6 sched.core.y/completion.o (ex sched.core.y/built-in.a) 2167 2428 0 4595 11f3 sched.core.n/cpuacct.o (ex sched.core.n/built-in.a) 2167 2428 0 4595 11f3 sched.core.y/cpuacct.o (ex sched.core.y/built-in.a) 61099 22114 488 83701 146f5 sched.core.n/core.o (ex sched.core.n/built-in.a) 70541 25370 508 96419 178a3 sched.core.y/core.o (ex sched.core.y/built-in.a) 3262 6272 0 9534 253e sched.core.n/wait_bit.o (ex sched.core.n/built-in.a) 3262 6272 0 9534 253e sched.core.y/wait_bit.o (ex sched.core.y/built-in.a) 12235 341 96 12672 3180 sched.core.n/rt.o (ex sched.core.n/built-in.a) 13073 917 96 14086 3706 sched.core.y/rt.o (ex sched.core.y/built-in.a) 10293 477 1928 12698 319a sched.core.n/topology.o (ex sched.core.n/built-in.a) 10363 509 1928 12800 3200 sched.core.y/topology.o (ex sched.core.y/built-in.a) 886 24 0 910 38e sched.core.n/cpupri.o (ex sched.core.n/built-in.a) 886 24 0 910 38e sched.core.y/cpupri.o (ex sched.core.y/built-in.a) 1061 64 0 1125 465 sched.core.n/stop_task.o (ex sched.core.n/built-in.a) 1077 128 0 1205 4b5 sched.core.y/stop_task.o (ex sched.core.y/built-in.a) 18443 365 24 18832 4990 sched.core.n/deadline.o (ex sched.core.n/built-in.a) 20019 2189 24 22232 56d8 sched.core.y/deadline.o (ex sched.core.y/built-in.a) 1123 8 64 1195 4ab sched.core.n/loadavg.o (ex sched.core.n/built-in.a) 1123 8 64 1195 4ab sched.core.y/loadavg.o (ex sched.core.y/built-in.a) 1323 8 0 1331 533 sched.core.n/stats.o (ex sched.core.n/built-in.a) 1323 8 0 1331 533 sched.core.y/stats.o (ex sched.core.y/built-in.a) 1282 164 32 1478 5c6 sched.core.n/isolation.o (ex sched.core.n/built-in.a) 1282 164 32 1478 5c6 sched.core.y/isolation.o (ex sched.core.y/built-in.a) 1564 36 0 1600 640 sched.core.n/cpudeadline.o (ex sched.core.n/built-in.a) 1564 36 0 1600 640 sched.core.y/cpudeadline.o (ex sched.core.y/built-in.a) 1640 56 0 1696 6a0 sched.core.n/swait.o (ex sched.core.n/built-in.a) 1640 56 0 1696 6a0 sched.core.y/swait.o (ex sched.core.y/built-in.a) 1859 244 32 2135 857 sched.core.n/clock.o (ex sched.core.n/built-in.a) 1859 244 32 2135 857 sched.core.y/clock.o (ex sched.core.y/built-in.a) 2339 8 0 2347 92b sched.core.n/cputime.o (ex sched.core.n/built-in.a) 2339 8 0 2347 92b sched.core.y/cputime.o (ex sched.core.y/built-in.a) 3014 32 0 3046 be6 sched.core.n/membarrier.o (ex sched.core.n/built-in.a) 3014 32 0 3046 be6 sched.core.y/membarrier.o (ex sched.core.y/built-in.a) 50027 964 96 51087 c78f sched.core.n/fair.o (ex sched.core.n/built-in.a) 51537 2484 96 54117 d365 sched.core.y/fair.o (ex sched.core.y/built-in.a) 3192 220 0 3412 d54 sched.core.n/idle.o (ex sched.core.n/built-in.a) 3276 252 0 3528 dc8 sched.core.y/idle.o (ex sched.core.y/built-in.a) 3633 0 0 3633 e31 sched.core.n/pelt.o (ex sched.core.n/built-in.a) 3633 0 0 3633 e31 sched.core.y/pelt.o (ex sched.core.y/built-in.a) 3794 160 0 3954 f72 sched.core.n/wait.o (ex sched.core.n/built-in.a) 3794 160 0 3954 f72 sched.core.y/wait.o (ex sched.core.y/built-in.a) I'd say this one is representative: text data bss dec hex filename 12235 341 96 12672 3180 sched.core.n/rt.o (ex sched.core.n/built-in.a) 13073 917 96 14086 3706 sched.core.y/rt.o (ex sched.core.y/built-in.a) which ~6% bloat is primarily due to the higher rq-lock inlining overhead, I believe. This is roughly what you'd expect from a change wrapping all 350+ inlined instantiations of rq->lock uses. I.e. it might make sense to uninline it. In terms of long term maintenance overhead, ignoring the overhead of the core-scheduling feature itself, the rq-lock wrappery is the biggest ugliness, the rest is mostly isolated. So if this actually *works* and improves the performance of some real VMEXIT-poor SMT workloads and allows the enabling of HyperThreading with untrusted VMs without inviting thousands of guest roots then I'm cautiously in support of it. > That all assumes that it works at all for the people who are clamoring > for this feature, but I guess they can run some loads on it eventually. > It's a holiday in the US right now ("Presidents' Day"), but maybe we > can get some numebrs this week? Such numbers would be *very* helpful indeed. Thanks, Ingo