Received: by 2002:a25:c205:0:0:0:0:0 with SMTP id s5csp186293ybf; Thu, 27 Feb 2020 18:55:10 -0800 (PST) X-Google-Smtp-Source: APXvYqzdwUycVVrHHQ2KHBi0hlj039hjAQBtQlSZLyjp/D3LCA6or3SStXlUfEKMYmHg0qPc3nqd X-Received: by 2002:a05:6830:1ca:: with SMTP id r10mr1530327ota.319.1582858510328; Thu, 27 Feb 2020 18:55:10 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1582858510; cv=none; d=google.com; s=arc-20160816; b=fW8M0EUzgiveSLHpgK3HpOajXe3i9LOuiuUgpoR6a4AZAwSccTS650GBV31AbQ3hBS A469JxwjAR4QJB+OuAWdSrSk1pbUQI/LznrYll5YZTzOwGTgmDVujmjaxXmL0CHIJD5+ gu1rG1TKyI3B/NK5LrHJsDYBZb0EfZ3pajz1zwM8HETQpq3immHEJjSvMF6VxhpFcGaS SuudbKl3Af5NKbM+PZET2WbViEV9VxEr9U/ol12Micd29X8RfZXUisw8VEsVlcgQNw2B XLRMUOnkErcbTtdH3GheU+o6RH4FhLkmNVAe31XGOXVYBz+TmIlF23mwQOarsAxJw8fe p0Vw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :dkim-signature; bh=XKoYXksDrflEVWxx489gzRo7OdqlZb9D0qCiQZUBdvI=; b=i24zqa8MF/bq+5eNXrTtjVbZEkXVI9zqjChadQ999Kp5Vtev1LIkpvmFt3nMHwTZ/0 gc+lmz21gY3HFMynsfYMidzz/kZdunuLq2AoK6elCtJylUZ+4C2PXeWwarzOpM5KxIKy fAtwt9/MuPv9HWUQF7S615qgET7c40R+8nVjz54dddLyQo/k7zMdxPlFOo+79KJQ4PxZ +jOEbqb4cr0t3YQDdtdTawoYbb3hJn30bgtKnNvG4xwaNtYbFOo89OWeazaikbj2yIuP lTZXxrCJUzau2knavgL4aInciDG5p6UMl3IRS/dLKherUBPx9BgNTggBheEfzUobCEuJ eAuA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=NLXOOXtP; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id q131si968559oig.203.2020.02.27.18.54.58; Thu, 27 Feb 2020 18:55:10 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=NLXOOXtP; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730638AbgB1Cye (ORCPT + 99 others); Thu, 27 Feb 2020 21:54:34 -0500 Received: from mail-pj1-f67.google.com ([209.85.216.67]:55911 "EHLO mail-pj1-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726943AbgB1Cye (ORCPT ); Thu, 27 Feb 2020 21:54:34 -0500 Received: by mail-pj1-f67.google.com with SMTP id a18so639630pjs.5 for ; Thu, 27 Feb 2020 18:54:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=XKoYXksDrflEVWxx489gzRo7OdqlZb9D0qCiQZUBdvI=; b=NLXOOXtP24zTmGIZU+y3WhhXRlNVs4Yea2dlBh55x7RSbWQIQqovzQJcGQzPdyRK7v kBCktxWgY5TxsvyRCt1ol0t+jSMWJ/1OVjsGHzWsYbkxpc585F2myxuy8wVu4c7hZB7c 5RuF3ZA4IzsRyH17ChwA8MT1ap+Au9pxncjiFUpT8lXMTYZWzHd1CPd0gkElkGIHRiZG i7g9et6mZaK/4Y4wNBBMAIqt4PjrSluvzS89w5p+5pEE2ds96lTqqVhIjp1Z0QSNtS0K U83E4G2UnBhJLwATZ/Yf22irKO/gwYOMBYs1ghcrZTzpmJ1PFn6G6jBoZ5MWfehA6APo hBKQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=XKoYXksDrflEVWxx489gzRo7OdqlZb9D0qCiQZUBdvI=; b=BxEuuMWbjEK6+6EG5+rfYtxzwgc3UhH6sbSORq5wlO8nH45E0mj9K+V5o0jzQdeCAN wb/B++q2It4OteNw72bGFtj07tvNP2Z9EVMNr2OwS+7aEmqLRKdi7oqQpRIoP9TsyR8O Red6HFNPFVFRgcG6svGSmZ9I/Tj8jNy+otllhONBR4m6nbk6l81Zxx/4i3P8NL6m1xqA 0xdZBLUxyDWdrft4VPj+XRw740sBgXzYwmJHj52U5hYTXjt4LKpnAJbteg0j31V2qnv2 g5J45qZemOJ8OQqqcLCZ9gQXvOfuGdoPYJ0DwmgfGGUJBbE9lB8FQPqLStW9E02fd+fr 1pWw== X-Gm-Message-State: APjAAAVDhqpq5ebGBABe2FzwWk9+Q+LjUbgtMr/K0O97x6SqFR+uh3Rj ouDI1Z+5hd0+BvIBUQHoxw8= X-Received: by 2002:a17:90a:a406:: with SMTP id y6mr2295163pjp.115.1582858472455; Thu, 27 Feb 2020 18:54:32 -0800 (PST) Received: from ziqianlu-desktop.localdomain ([47.89.83.64]) by smtp.gmail.com with ESMTPSA id e28sm8273973pgn.21.2020.02.27.18.54.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 27 Feb 2020 18:54:31 -0800 (PST) Date: Fri, 28 Feb 2020 10:54:05 +0800 From: Aaron Lu To: Phil Auld Cc: Vineeth Remanan Pillai , Aubrey Li , Tim Chen , Julien Desfossez , Nishanth Aravamudan , Peter Zijlstra , Ingo Molnar , Thomas Gleixner , Paul Turner , Linus Torvalds , Linux List Kernel Mailing , Dario Faggioli , =?iso-8859-1?Q?Fr=E9d=E9ric?= Weisbecker , Kees Cook , Greg Kerr , Valentin Schneider , Mel Gorman , Pawan Gupta , Paolo Bonzini Subject: Re: [RFC PATCH v4 00/19] Core scheduling v4 Message-ID: <20200228025405.GA634650@ziqianlu-desktop.localdomain> References: <3c3c56c1-b8dc-652c-535e-74f6dcf45560@linux.intel.com> <20200212230705.GA25315@sinkpad> <29d43466-1e18-6b42-d4d0-20ccde20ff07@linux.intel.com> <20200225034438.GA617271@ziqianlu-desktop.localdomain> <20200227020432.GA628749@ziqianlu-desktop.localdomain> <20200227141032.GA30178@pauld.bos.csb> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200227141032.GA30178@pauld.bos.csb> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Feb 27, 2020 at 09:10:33AM -0500, Phil Auld wrote: > Hi Aaron, > > On Thu, Feb 27, 2020 at 10:04:32AM +0800 Aaron Lu wrote: > > On Tue, Feb 25, 2020 at 03:51:37PM -0500, Vineeth Remanan Pillai wrote: > > > On a 2sockets/16cores/32threads VM, I grouped 8 sysbench(cpu mode) > > > > threads into one cgroup(cgA) and another 16 sysbench(cpu mode) threads > > > > into another cgroup(cgB). cgA and cgB's cpusets are set to the same > > > > socket's 8 cores/16 CPUs and cgA's cpu.shares is set to 10240 while cgB's > > > > cpu.shares is set to 2(so consider cgB as noise workload and cgA as > > > > the real workload). > > > > > > > > I had expected cgA to occupy 8 cpus(with each cpu on a different core) > > > > > > The expected behaviour could also be that 8 processes share 4 cores and > > > 8 hw threads right? This is what we are seeing mostly > > > > I expect the 8 cgA tasks to spread on each core, instead of occupying > > 4 cores/8 hw threads. If they stay on 4 cores/8 hw threads, than on the > > core level, these cores' load would be much higher than other cores > > which are running cgB's tasks, this doesn't look right to me. > > > > I don't think that's a valid assumption, at least since the load balancer rework. > > The scheduler will be looking much more at the number of running task versus > the group weight. So in this case 2 running tasks, 2 siblings at the core level > will look fine. There will be no reason to migrate. In the absence of core scheduling, I agree there is no reason to migrate since no matter how to migrate, the end result is one high-weight task sharing a core with another (high or low weight) task. But with core scheduling, things can be different: if the high weight tasks are spread among cores, then these high weight tasks can enjoy the core alone(by force idling its sibling) and get better performance. I'm thinking to use core scheduling to protect main workload's performance in a colocated environment, similar to the realtime use case described here: https://lwn.net/Articles/799454/ I'll quote the relevant part here: " But core scheduling can force sibling CPUs to go idle when a realtime process is running on the core, thus preventing this kind of interference. That opens the door to enabling SMT whenever a core has no realtime work, but effectively disabling it when realtime constraints apply, getting the best of both worlds. " Using cpuset for the main workload to only allow its task run on one HT of each core might also solve this, but I had hoped not to need use cpuset as that can add complexity in deployment. > > I think the end result should be: each core has two tasks queued, one > > cgA task and one cgB task(to maintain load balance on the core level). > > The two tasks are queued on different hw thread, with cgA's task runs > > most of the time on one thread and cgB's task being forced idle most > > of the time on the other thread. > > > > With the core scheduler that does not seem to be a desired outcome. I think > grouping the 8 cgA tasks on the 8 cpus of 4 cores seems right. > When the core wide weight is somewhat balanced, yes I definitely agree. But when core wide weight mismatch a lot, I'm not so sure since if these high weight task is spread among cores, with the feature of core scheduling, these high weight tasks can get better performance. So this appeared to me like a question of: is it desirable to protect/enhance high weight task performance in the presence of core scheduling?