Received: by 2002:a25:1506:0:0:0:0:0 with SMTP id 6csp4391414ybv; Tue, 25 Feb 2020 19:15:18 -0800 (PST) X-Google-Smtp-Source: APXvYqzFMrgEbQwh3x8syId4Nn22M5NefQ6ykXxaWxuJfbQYvDAdheqXDfcrlBeVJFFyRA4ILJY0 X-Received: by 2002:aca:220c:: with SMTP id b12mr1444372oic.55.1582686918482; Tue, 25 Feb 2020 19:15:18 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1582686918; cv=none; d=google.com; s=arc-20160816; b=ykKTAbov2tNYGYzeEVDL+qbz9uo8XU61wFW4LdEmjUAfQHyLGFkKronRkKSOxMjHNg yxharplJMCBBx3ljlGNprRBgbXgKbpf0sB/37T96yT8vQbZzz4krQK5TACRVHN6Otlea B4rN64YcbnSPyB62vKxWtJrQWwKEyPZxtVdJDg4aXxgrwGiHvMyx177l8Es+fmQQNg5N iH8hIjkYY576it7FtHP5rrIjrTNPHfzi4r2o9ivZkWd11P05qq21Ym7ksQWdGDNre0JD DZ9c2LutAqt85Ak42heMW1ZGaJkJStNxwD5kh1sgn2rktMWvS1dXcV0XTffxFrOmRceo u73Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :dkim-signature; bh=zbi7qmKEMcw9IOGhaG30ynmov8uYebQaedUD8WWaASQ=; b=rNoJZW3DCptuR5ChJQoYMDkc4++iYObMxPiaMlW4D0eJnnpYN1ZjNCFszi97EZmNlm 1X/2kS/2vuxf5/s27JyfgmqQHLIyoaaYq0fl0E8Jzm5A/T8J+XuEtSIvJNi4V+B3jJU/ waLjzH32mR8Q5cy1Wtv3+hMKIwH1bghFUFJaYUvLSZvl20sUf/ASUXzsdEepdi7nG5qN /M1ShdTOXqWAxRadDeJMYr3x/1fGAV1qGgkEmg70yP6SGgQkNd8X4Ho8til+ZBrHuKtL rnBr7CstkBVzWzKMwt+ZMXn3SnQODZ292D8nX3Qv6RxjTl6IbHXjXB1EO42GUe/okXVS 5R4A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=J5S+NxMT; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id j16si525725otq.23.2020.02.25.19.15.06; Tue, 25 Feb 2020 19:15:18 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=J5S+NxMT; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730287AbgBZDNs (ORCPT + 99 others); Tue, 25 Feb 2020 22:13:48 -0500 Received: from mail-pf1-f195.google.com ([209.85.210.195]:46550 "EHLO mail-pf1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727880AbgBZDNs (ORCPT ); Tue, 25 Feb 2020 22:13:48 -0500 Received: by mail-pf1-f195.google.com with SMTP id k29so653558pfp.13 for ; Tue, 25 Feb 2020 19:13:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=zbi7qmKEMcw9IOGhaG30ynmov8uYebQaedUD8WWaASQ=; b=J5S+NxMTjctIQPTtkmjG25TtM5/S9cJB5YSAvFcp4SUnBaod8ETHT0bvr0OMXWox7P zABB8gAuDjjTPOKKpingxrkK+yS+lRtuOZNGglpH9MAIk+kbeX0qJT5jAHXaag464irP Ezmhp1kwn8jtsiAYeyC7jZSP3XMi1feEonbjaq9dwkUV76/kVBar+bqTbc4qAf4zjYyu WvFTp6wliYXJt0obn5xSLtZ2da7wpYnAdkytA208ZfE6nSImkIvNNfre7QNfZ+6KyvS6 yksymkE520lN/y1ZSKlCUsF3Z0s5HGaMDFIs/rVaJSPvrDELGHn0Egt8ydZI/say+RxX lGTg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=zbi7qmKEMcw9IOGhaG30ynmov8uYebQaedUD8WWaASQ=; b=M3Z70/arMFSYMicQk7ZvJ6wvLPe36+1LjTiUvRMUPMIXl9Ye4zhG8gw1xpVDnqqe7x ghKpi1AHdnHE1r3VA52eK/AwR9w5qe9u5OiwVRrQOhWIZPTsVJF6jyrBvqTwXKS2DjFL CC04ffByapSUyGFc2qaIaQr8PzVCy5gHvunqtPqjvmmzSSX7+vXsI2hOquKamxS1lJX0 NIN7b83OjRMJiAmcFv4NdxP4sKPRfUZRJnXYBW6gvXbpLnfsmQG+1/ZwTxPaUxJMlPdJ K5emLyyErrhxDlAbnC69KdwaSl0tuI0F8s2xS6oOo2XNb7oa8g0PDOCJD+2z8q0EKmD9 EeKQ== X-Gm-Message-State: APjAAAUvSxSpZWrQrObol/E5KS1Qg25+OBwvMIra+CH/LfBijzjyouqB 3h4mT5tc0qgv/ZnvCk0um1o= X-Received: by 2002:aa7:8299:: with SMTP id s25mr1877254pfm.261.1582686826411; Tue, 25 Feb 2020 19:13:46 -0800 (PST) Received: from ziqianlu-desktop.localdomain ([47.89.83.64]) by smtp.gmail.com with ESMTPSA id r66sm533019pfc.74.2020.02.25.19.13.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 25 Feb 2020 19:13:45 -0800 (PST) Date: Wed, 26 Feb 2020 11:13:36 +0800 From: Aaron Lu To: Vineeth Remanan Pillai Cc: Aubrey Li , Tim Chen , Julien Desfossez , Nishanth Aravamudan , Peter Zijlstra , Ingo Molnar , Thomas Gleixner , Paul Turner , Linus Torvalds , Linux List Kernel Mailing , Dario Faggioli , =?iso-8859-1?Q?Fr=E9d=E9ric?= Weisbecker , Kees Cook , Greg Kerr , Phil Auld , Valentin Schneider , Mel Gorman , Pawan Gupta , Paolo Bonzini Subject: Re: [RFC PATCH v4 00/19] Core scheduling v4 Message-ID: <20200226031336.GA622976@ziqianlu-desktop.localdomain> References: <5e3cea14-28d1-bf1e-cabe-fb5b48fdeadc@linux.intel.com> <3c3c56c1-b8dc-652c-535e-74f6dcf45560@linux.intel.com> <20200212230705.GA25315@sinkpad> <29d43466-1e18-6b42-d4d0-20ccde20ff07@linux.intel.com> <20200225034438.GA617271@ziqianlu-desktop.localdomain> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Feb 25, 2020 at 03:51:37PM -0500, Vineeth Remanan Pillai wrote: > Hi Aaron, > We tried reproducing this with a sample script here: > https://gist.github.com/vineethrp/4356e66694269d1525ff254d7f213aef Nice script. > But the set1 cgroup processes always get their share of cpu time in > our test. Could you please verify if its the same test that you were > also doing? The only difference is that we run on a 2 16core/32thread > socket bare metal using only socket 0. We also tried threads instead of > processes, but the results are the same. Sorry for missing one detail: I always start the noise workload first, and then start the real workload. This is critical for this test here since only so, the noise workload can occupy all CPUs and presents a challange for the load balancer to balance real workload's tasks. If both workloads are started at the same time, the initial task placement might mitigate the problem. BTW, your script has given 12cores/24 CPUs to the workloads and cgA spawned 16 tasks, cgB spawned 32. This is an even more complex scenario to test since the real workload's task number is already more than the available core number. Perhaps just starting 12 tasks for cgA and 24 tasks for cgB is enough for now. As for the start sequence, simply sleep 5 seconds after cgB workload is started, then start cgA. I have left a comment on the script's gist page. > > > On a 2sockets/16cores/32threads VM, I grouped 8 sysbench(cpu mode) > > threads into one cgroup(cgA) and another 16 sysbench(cpu mode) threads > > into another cgroup(cgB). cgA and cgB's cpusets are set to the same > > socket's 8 cores/16 CPUs and cgA's cpu.shares is set to 10240 while cgB's > > cpu.shares is set to 2(so consider cgB as noise workload and cgA as > > the real workload). > > > > I had expected cgA to occupy 8 cpus(with each cpu on a different core) > > The expected behaviour could also be that 8 processes share 4 cores and > 8 hw threads right? This is what we are seeing mostly > > most of the time since it has way more weight than cgB, while cgB should > > occupy almost no CPUs since: > > - when cgB's task is in the same CPU queue as cgA's task, then cgB's > > task is given very little CPU due to its small weight; > > - when cgB's task is in a CPU queue whose sibling's queue has cgA's > > task, cgB's task should be forced idle(again, due to its small weight). > > > We are seeing the cgA is taking half the cores and cgB taking rest half > of the cores. Looks like the scheduler ultimately groups the tasks into > its own cores. > > > > > > But testing shows cgA occupies only 2 cpus during the entire run while > > cgB enjoys the remaining 14 cpus. As a comparison, when coresched is off, > > cgA can occupy 8 cpus during its run. > > > > Not sure why we are not able to reproduce this. I have a quick patch > which might fix this. The idea is that, we allow migration if p's > hierarchical load or estimated utilization is more than dest_rq->curr. > While thinking about this fix, I noticed that we are not holding the > dest_rq lock for any of the migration patches. Migration patches would > probably need a rework. Attaching my patch down, but it also does not > take the dest_rq lock. I have also added a case of dest_core being > forced_idle. I think that would be an opportunity to migrate. Ideally > we should check if the forced idle task has the same cookie as p. > > https://gist.github.com/vineethrp/887743608f42a6ce96bf7847b5b119ae Is this on top of Aubrey's coresched_v4-v5.5.2 branch?