Received: by 10.223.176.5 with SMTP id f5csp2475013wra; Mon, 5 Feb 2018 04:50:23 -0800 (PST) X-Google-Smtp-Source: AH8x2243FblKYFTIpVlNnoC7zoeqOz1kV16LfeGQExOqwwa+hO8ubIQaPzu4na6yghduW5TY6DGE X-Received: by 2002:a17:902:2c01:: with SMTP id m1-v6mr42511987plb.15.1517835023872; Mon, 05 Feb 2018 04:50:23 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1517835023; cv=none; d=google.com; s=arc-20160816; b=JNCrh0y83cSw4DmMJc5MrvmgjBVJ7NASOck+Fp61nvCku7WbfWs8X+h76CzpNJZD5g Avn9lZugFU8WBBLPGhPmbitvyq74v82nzq4I7HlfutEmGINltpaqfHGKswfm+MQqZC1P Ac2i92klbOileI5iSKd2BZ5Z582G0uzKbADss474ceQomYeZx5eHIUIXSwwmNkrtA/na db3p7PMqN+sqZf0pT5O6pwjCUfALXB1Fw8fWhTi8wjL0W05xOBUuUPP0NPAj32m+5Of4 7oD+CSBJNCAX9j4+ljtuKiGV6ddMTDOBDr6x6v7kbJPfFHMvIii64QbE3m5z1afnHf0q V16g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature:arc-authentication-results; bh=7NbJVx3VgtEarKNwpx6fIVmsS0bGiIrnPGfB97JV0IU=; b=eS7ECiNM/VtrSJ6wD0B39QcoLLrdek4nHbdAteGRj8BIWSAYh5w2TEB66EOXsL84oT 7qPXO3KlkRCXHoGZtbHyiXxXKsF61/QnHEUzZNz1G9EYnzh9zFzv/JuL5Oxk1PdMVmqu TlKP5BvEr5oyV+ik8LHB31Q4SE0cjsB52r4J9mRQZOGKAwBKjosQbvEnZ33Ab5Hud00H rjqDhkZ1HHSseDzLmfx5dB572e1PRERx/YyR+WgSbo0ktofWAuVhu8R66aWTEcOQgDHc xfPLzVnrMT8OArIMs1qL277obYAFu4k68Os6aDGeLgGz8aPAGCulf1KwPFjb2YnTEcrc 9NGg== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@infradead.org header.s=bombadil.20170209 header.b=lzmlZvDI; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id z11si6883142pfe.113.2018.02.05.04.50.09; Mon, 05 Feb 2018 04:50:23 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail header.i=@infradead.org header.s=bombadil.20170209 header.b=lzmlZvDI; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753160AbeBEMtL (ORCPT + 99 others); Mon, 5 Feb 2018 07:49:11 -0500 Received: from bombadil.infradead.org ([65.50.211.133]:59925 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753138AbeBEMs6 (ORCPT ); Mon, 5 Feb 2018 07:48:58 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20170209; h=In-Reply-To:Content-Type:MIME-Version :References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=7NbJVx3VgtEarKNwpx6fIVmsS0bGiIrnPGfB97JV0IU=; b=lzmlZvDILWzZqt1T5X1FzSIq5 MwGxj8ISAXPiTZLSrtvlod0apeSvPj4U4JqvnpjZNDLlAHm95jX2u8uMnQAeSRA9FeAIsvjj1y4ed XCi4e39a/wjlsrpiGOwUL9OPmWK3vzwpNoN/ZlHo2qWqBdSgh0TxucXRYVcrmtwbM5wFLNkm8CqvQ nJyZNPa8iU8DYUvQ/IXF2YpypO7Qx12We3BQ/dLDRAIvJHYvPkNVChw3ByxZdXkQwDbrW+80IKN8F MPL6ryAipX1KTSi3nk13WO88hi8xo7rBWsomvBtwn1Ue+TPyZgN0Iq+FBEZK/8x+n3ti5DfsCVzuv FR16JFDEA==; Received: from j217100.upc-j.chello.nl ([24.132.217.100] helo=hirez.programming.kicks-ass.net) by bombadil.infradead.org with esmtpsa (Exim 4.89 #1 (Red Hat Linux)) id 1eigCe-0005VW-KT; Mon, 05 Feb 2018 12:48:56 +0000 Received: by hirez.programming.kicks-ass.net (Postfix, from userid 1000) id 5A5B52029F9F9; Mon, 5 Feb 2018 13:48:54 +0100 (CET) Date: Mon, 5 Feb 2018 13:48:54 +0100 From: Peter Zijlstra To: Steven Sistare Cc: subhra mazumdar , linux-kernel@vger.kernel.org, mingo@redhat.com, dhaval.giani@oracle.com Subject: Re: [RESEND RFC PATCH V3] sched: Improve scalability of select_idle_sibling using SMT balance Message-ID: <20180205124854.GX2269@hirez.programming.kicks-ass.net> References: <20180129233102.19018-1-subhra.mazumdar@oracle.com> <20180201123335.GV2249@hirez.programming.kicks-ass.net> <911d42cf-54c7-4776-c13e-7c11f8ebfd31@oracle.com> <20180202195943.GR2269@hirez.programming.kicks-ass.net> <25d67bd2-cbe7-2c2a-e89a-13a7ca5adc10@oracle.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <25d67bd2-cbe7-2c2a-e89a-13a7ca5adc10@oracle.com> User-Agent: Mutt/1.9.2 (2017-12-15) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Feb 02, 2018 at 04:06:32PM -0500, Steven Sistare wrote: > On 2/2/2018 2:59 PM, Peter Zijlstra wrote: > > But then you get that atomic crud to contend on the cluster level, which > > is even worse than it contending on the core level. > > True, but it can still be a net win if we make better scheduling decisions. > A saving grace is that the atomic counter is only updated if the cpu > makes a transition from idle to busy or vice versa. Which can still be a very high rate for some workloads. I always forget which, but there are plenty workloads that have very frequenct very short idle times. Mike, do you remember what comes apart when we take out the sysctl_sched_migration_cost test in idle_balance()? > We need data for this type of system, showing improvements for normal > workloads, and showing little downside for a high context switch rate > torture test. So core-wide atomics should, on architectures that can do atomics in L1, be relatively fast. Once you leave L1, atomic contention goes up a fair bit. And then there's architectures that simply don't do atomics in L1 (like Power). Testing on my SKL desktop, atomics contending between SMT siblings is basically free (weirdly enough my test says its cheaper), atomics contending on the L3 is 10x as expensive, and this is with only 4 cores. If I test it on my 10 core IVB, I'm up to 20x, and I can't imagine that getting any better with bigger core count (my IVB does not show SMT contention as lower, but not particularly more expensive either). So while I see the point of tracking these numbers (for SMT>2), I don't think its worth doing outside of the core, and then we still need some powerpc (or any other architecture with abysmal atomics) tested. So what we can do is make this counting crud conditional on SMT>2 and possibly part of the topology flags such that an architecture can opt-out. Then select_idle_core can be augmented to remember the least-loaded core it encounters in its traversal, and go with that.