Received: by 2002:a25:c593:0:0:0:0:0 with SMTP id v141csp1164596ybe; Fri, 13 Sep 2019 12:04:30 -0700 (PDT) X-Google-Smtp-Source: APXvYqwgwuWsKJ0ebAWtsozJNaBvwTGf/Py3CTCV/ySuXqVwZ/OUZc7Gf60PbWh1d/Cu09MMx4lw X-Received: by 2002:a17:906:a414:: with SMTP id l20mr41274253ejz.211.1568401470532; Fri, 13 Sep 2019 12:04:30 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1568401470; cv=none; d=google.com; s=arc-20160816; b=tqiWGiO9VA+DhqfjJs7L3EaL9VtBOyb7Y++QnZCrJBXQKEVo2qGleM4BWVCjLbfBZ5 u83qaGbHreRACpTkNCuQsvi8IeLvdfUa2CQD2HIHjfjVOHmV8lRw1n5/Tjl0sAbW33Wt Vg0otRdNkry3NMnc3CVfg6bUOqcmeu7Hd7TMNq7tg1VAkh6GeJRFJeO7OsBt/Mw5Meis 7WWjfUk7O6i6wIspewQ0ZbCo/eiMVNHxtJ4u+JvdN4Almh8CBpehmkHVjOz3S6QXxioS YHyZkAB2vbDigEzo372+/B5h/kJXwfsfbi+OiruqXWQav9K42eq4a22l77cjVZ+b77VM 9CBw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=DT46rx/3FquhXTL+3IVD0S8t5lkeOHWlOA9H0NXXnFU=; b=JTZh0MFdNoSZFPPVUzSSE1eYHQ+eumRrx+66+E7Lp5zCF8GPIxqCZOI3IHELoKXtbB Ul3k7R2rAZdx0STGzErsPQ55kAh2SJztrpcDZlKu5rAGqe97hteb2vT+F0t6x8HJGLKk VcsgqWhEiqxdWB+Oe4XUQHpkxMXE9/DVX1ue02f0wV8RwoA5Ul2Gk3GmcMWh0xTYPt2Y MjnC7kJs4yb9q73r0zYpTVyoL5UfIxz6SXDSJYMtWUYrsLQqueH4wvdwqXrF6rq2QSGF Ktn15WKjQJHiibB0TRIas8UIcsbi+2wxpUC7nycnailgvm7bUWJC/NojxZvVTXV4MjJV Aupw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id v10si6657984edc.224.2019.09.13.12.04.06; Fri, 13 Sep 2019 12:04:30 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2389044AbfIMOPr (ORCPT + 99 others); Fri, 13 Sep 2019 10:15:47 -0400 Received: from out30-133.freemail.mail.aliyun.com ([115.124.30.133]:39269 "EHLO out30-133.freemail.mail.aliyun.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2387968AbfIMOPr (ORCPT ); Fri, 13 Sep 2019 10:15:47 -0400 X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R261e4;CH=green;DM=||false|;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e01422;MF=aaron.lu@linux.alibaba.com;NM=1;PH=DS;RN=22;SR=0;TI=SMTPD_---0TcEdX6g_1568384140; Received: from aaronlu(mailfrom:aaron.lu@linux.alibaba.com fp:SMTPD_---0TcEdX6g_1568384140) by smtp.aliyun-inc.com(127.0.0.1); Fri, 13 Sep 2019 22:15:42 +0800 Date: Fri, 13 Sep 2019 22:15:40 +0800 From: Aaron Lu To: Tim Chen Cc: Vineeth Remanan Pillai , Julien Desfossez , Dario Faggioli , "Li, Aubrey" , Aubrey Li , Subhra Mazumdar , Nishanth Aravamudan , Peter Zijlstra , Ingo Molnar , Thomas Gleixner , Paul Turner , Linus Torvalds , Linux List Kernel Mailing , =?iso-8859-1?Q?Fr=E9d=E9ric?= Weisbecker , Kees Cook , Greg Kerr , Phil Auld , Valentin Schneider , Mel Gorman , Pawan Gupta , Paolo Bonzini Subject: Re: [RFC PATCH v3 00/16] Core scheduling v3 Message-ID: <20190913141540.GB81644@aaronlu> References: <7dc86e3c-aa3f-905f-3745-01181a3b0dac@linux.intel.com> <20190802153715.GA18075@sinkpad> <69cd9bca-da28-1d35-3913-1efefe0c1c22@linux.intel.com> <20190911140204.GA52872@aaronlu> <7b001860-05b4-4308-df0e-8b60037b8000@linux.intel.com> <20190912123532.GB16200@aaronlu> <8373e386-cb99-8f79-a78e-5e79dc962b81@linux.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <8373e386-cb99-8f79-a78e-5e79dc962b81@linux.intel.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Sep 12, 2019 at 10:29:13AM -0700, Tim Chen wrote: > On 9/12/19 5:35 AM, Aaron Lu wrote: > > On Wed, Sep 11, 2019 at 12:47:34PM -0400, Vineeth Remanan Pillai wrote: > > > > > core wide vruntime makes sense when there are multiple tasks of > > different cgroups queued on the same core. e.g. when there are two > > tasks of cgroupA and one task of cgroupB are queued on the same core, > > assume cgroupA's one task is on one hyperthread and its other task is on > > the other hyperthread with cgroupB's task. With my current > > implementation or Tim's, cgroupA will get more time than cgroupB. > > I think that's expected because cgroup A has two tasks and cgroup B > has one task, so cgroup A should get twice the cpu time than cgroup B > to maintain fairness. Like you said below, the ideal run time for each cgroup should depend on their individual weight. The fact cgroupA has two tasks doesn't mean it has twice the weight. Both cgroup can have the same cpu.share settings and then, the more task a cgroup has, the less weight it can get for the cgroup's per-cpu se. I now realized one thing that's different in your idle_allowance implementation and my core_vruntime implementation. In your implementation, the idle_allowance is absolute time while vruntime can be adjusted by the se's weight, that's probably one area your implementation can make things less fair then mine. > > If we > > maintain core wide vruntime for cgroupA and cgroupB, we should be able > > to maintain fairness between cgroups on this core. > > I don't think the right thing to do is to give cgroupA and cgroupB equal > time on a core. The time they get should still depend on their > load weight. Agree. > The better thing to do is to move one task from cgroupA to another core, > that has only one cgroupA task so it can be paired up > with that lonely cgroupA task. This will eliminate the forced idle time > for cgropuA both on current core and also the migrated core. I'm not sure if this is always possible. Say on a 16cores/32threads machine, there are 3 cgroups, each has 16 cpu intensive tasks, will it be possible to make things perfectly balanced? Don't get me wrong, I think this kind of load balancing is good and needed, but I'm not sure if we can always make things perfectly balanced. And if not, do we care those few cores where cgroup tasks are not balanced and then, do we need to implement the core_wide cgoup fairness functionality or we don't care since those cores are supposed to be few and isn't a big deal? > > Tim propose to solve > > this problem by doing some kind of load balancing if I'm not mistaken, I > > haven't taken a look at this yet. > > > > My new patchset is trying to solve a different problem. It is > not trying to maintain fairness between cgroup on a core, but tries to > even out the load of a cgroup between threads, and even out general > load between cores. This will minimize the forced idle time. Understood. > > The fairness between cgroup relies still on > proper vruntime accounting and proper comparison of vruntime between > threads. So for now, I am still using Aaron's patchset for this purpose > as it has better fairness property than my other proposed patchsets > for fairness purpose. > > With just Aaron's current patchset we may have a lot of forced idle time > due to the uneven distribution of tasks of different cgroup among the > threads and cores, even though scheduling fairness is maintained. > My new patches try to remove those forced idle time by moving the > tasks around, to minimize cgroup unevenness between sibling threads > and general load unevenness between the CPUs. Yes I think this is definitely a good thing to do.