Received: by 10.223.176.5 with SMTP id f5csp387410wra; Wed, 7 Feb 2018 00:43:55 -0800 (PST) X-Google-Smtp-Source: AH8x227Ht/7kUNtDBF1Gql8wwRM/TAxa+w2/BGx2Gr3ewgtKYSznKPUTlBnYpfxQCLU4/okVMG9+ X-Received: by 10.99.61.75 with SMTP id k72mr3323909pga.384.1517993035180; Wed, 07 Feb 2018 00:43:55 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1517993035; cv=none; d=google.com; s=arc-20160816; b=0nqjpeBUaMVGc5AoUetdzMck4oD4oUDVG+D8HGPEFoSHZA2Lxul/Fali8wS5zECu1a 8lInjJXw/d7KlRC1/uGo2QXDWfxA+2GVm9Ri2dSooJrBkW8tBrkXh50e+o/o1kVdwiTz P6kqS8ZA2x8o0DUuJaJ2g5QwKvkD64zPlZhlzryVkcF7TRwRcbcWaWEF5mJWhVYhDeuC MUol2CRbBhkvP4NzJ7PkATzJ+y1xc4Lb5wdWP0VIn7JK1g9LOCS9RkZAOXD5qClJSu9m 9lus/ahE3mIyJKSISsasWN+fKs+Fi9dgmNYCjOt9I7U+XCFtBjq0S3w7XK5Mhq9+2onB Z73A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature:arc-authentication-results; bh=Vtl6GdsGh2iO0wIU7SI0GRq4CAgm9YobxpCbx0YKTXc=; b=vNHd7KSA92nAYGOjhe7bLcvsHcZCoX862VpN604PtxItnG+Gia7G9Y2ToJTuCy/peS lnHQ8b7CcH3jGj+bg6pWrFsP2gkhwHoHYiamYNPkUDzqrNZ0vinBALukbUeQ7egoH2/T NpcgoU+/xQ0dCV9iu9NpkTPp2uTOviUZzhGs8Jk0KRLQyg2f/ptIZZW8VcjlJRGvEgxs PAxI3PGPrj5CjM9FIgyRz6Yve6mfXHg8CMh8aQtEKAyFvwj0MYO5+9v5gl2ICWllDOaU 9pMh/xDY2zXl5vis7YvHYaPHD99n8PdcqyQdeR8laouGZsttHCQi0n/G9k/yUgCupouH gdKg== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@infradead.org header.s=bombadil.20170209 header.b=L9H2Qu3k; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id u1-v6si761731plq.797.2018.02.07.00.43.39; Wed, 07 Feb 2018 00:43:55 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail header.i=@infradead.org header.s=bombadil.20170209 header.b=L9H2Qu3k; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753476AbeBGIm7 (ORCPT + 99 others); Wed, 7 Feb 2018 03:42:59 -0500 Received: from bombadil.infradead.org ([65.50.211.133]:39435 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753066AbeBGIm6 (ORCPT ); Wed, 7 Feb 2018 03:42:58 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20170209; h=In-Reply-To:Content-Type:MIME-Version :References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=Vtl6GdsGh2iO0wIU7SI0GRq4CAgm9YobxpCbx0YKTXc=; b=L9H2Qu3ks/Iqh4ErgBCaRYjOR cNLJtdhiwl6ixZUP20qqNYhhKBmJDqREsyt5z38aW0K9WgA86v2WNxj3+ehyOnjgB3MU/lYCCY9g9 nuNdHjwtx9HPIsOcbe+cSVM99jcQBG9QWEO/itZ0emU4rp7Zx+UHTcUWSzVbicftHrLNlNuV0R7zm PSeFcRzUJv7AnqUNSZ3g0UCkqyV7uFn+eBFD7HYbq6OS1lVm/jdyvZC1HOaKe8BVd1v1CwHYoGVNc tGjq53+LGjjfMwarV4MIyOIhS2fNVd0Rgi+PdNO95k7AypBbTvSJWx6vO+xQjqYcV2T54fqlvBh7c m735g9QLg==; Received: from j217100.upc-j.chello.nl ([24.132.217.100] helo=hirez.programming.kicks-ass.net) by bombadil.infradead.org with esmtpsa (Exim 4.89 #1 (Red Hat Linux)) id 1ejLJg-0001bJ-QQ; Wed, 07 Feb 2018 08:42:57 +0000 Received: by hirez.programming.kicks-ass.net (Postfix, from userid 1000) id D98A02029F9F9; Wed, 7 Feb 2018 09:42:54 +0100 (CET) Date: Wed, 7 Feb 2018 09:42:54 +0100 From: Peter Zijlstra To: Subhra Mazumdar Cc: Steven Sistare , linux-kernel@vger.kernel.org, mingo@redhat.com, dhaval.giani@oracle.com Subject: Re: [RESEND RFC PATCH V3] sched: Improve scalability of select_idle_sibling using SMT balance Message-ID: <20180207084254.GI2269@hirez.programming.kicks-ass.net> References: <20180129233102.19018-1-subhra.mazumdar@oracle.com> <20180201123335.GV2249@hirez.programming.kicks-ass.net> <911d42cf-54c7-4776-c13e-7c11f8ebfd31@oracle.com> <20180202171708.GN2269@hirez.programming.kicks-ass.net> <93db4b69-5ec6-732f-558e-5e64d9ba0cf9@oracle.com> <20180205121947.GW2269@hirez.programming.kicks-ass.net> <930364e4-bbfe-8c8f-d095-0dd4256a5104@oracle.com> <20180206091239.GA2269@hirez.programming.kicks-ass.net> <97500234-ebbb-4404-d4de-ab10d3ec79e1@oracle.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <97500234-ebbb-4404-d4de-ab10d3ec79e1@oracle.com> User-Agent: Mutt/1.9.2 (2017-12-15) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Feb 06, 2018 at 04:30:03PM -0800, Subhra Mazumdar wrote: > I meant the SMT balance patch. That does comparison with only one other > random core and takes the decision in O(1). Any potential scan of all cores > or cpus is O(n) and doesn't scale and will only get worse in future. That > applies to both select_idle_core() and select_idle_cpu(). We only do the full scan if we think to know there is indeed an idle core to be had, and if there are idle cores the machine isn't terribly busy. If there are no idle cores, we do not in fact scan everything. We limit the amount of scanning based on the average idle time with a minimum of 4. (Except when you switch off things like SIS_PROP, then you scan unconditionally and reduce tail latency at the cost of throughput -- like said, some people want this). O(1) sounds nice, but it has horrible worst case latencies. And like I said, I don't see how a rotor is particularly more random than the previous cpu the task you happen to be waking ran on. > Is there any reason this randomized approach is not acceptable even if > benchmarks show improvement? Are there other benchmarks I should try? Looking at just one other core has a fairly high chance of not finding idle. I really cannot remember all the benchmarks, but Mike did something a little less random but still O(1) a few years back and that crashed and burned. > Also your suggestion to keep the SMT utilization but still do a traversal of > cores > in select_idle_core() while remembering the least loaded core will still > have > the problem of potentially traversing all cores. I can compare this with a > core > level only SMT balancing, is that useful to decide? I will also test on > SPARC > machines with higher degree of SMT. Please learn to use your email client, that's near unreadable for whitespace damage, time is really too precious to try and untangle crap like that. > You had also mentioned to do it for only SMT >2, not sure I understand why > as even for SMT=2 (intel) benchmarks show improvement. This clearly shows > the scalability problem. For SMT2 you don't need the occupation counter with atomic crud, a simple !atomic core-idle works just fine. And your 'patch' had soo many moving parts it was too hard to say which aspect changed what. hackbench really isn't _that_ interesting a benchmark and its fairly easy to make it go faster, but doing so typically wrecks something else. There's a whole array of netperf benchmarks which you have to run at different utilization levels, there's the sysbench oltp stuff, which you can run on MariaDB and PostgreSQL, again at different utilization levels (both databases behave quite differently). There's ebizzy, tbench, dbench and others. There's also NUMA stuff, like NAS benchmarks and specJBB nonsense. There's the facebook schbench, which is a bit finnicky to set up. And I'm sure I'm forgetting half, look at the patches and regression reports for more clues, that's what Google is for. You could have figured all that out yourself, much of these are listed in the changelogs and if you'd bothered to find lkml regression reports you could find more.