Received: by 2002:ac0:a582:0:0:0:0:0 with SMTP id m2-v6csp3067564imm; Fri, 19 Oct 2018 04:40:57 -0700 (PDT) X-Google-Smtp-Source: ACcGV62FCy1GJlfPsS/JotKIcpNUkmurxhHSJfKqhu38gLHTnu2jIASAUPAe2v8iKsx8IguFBqmY X-Received: by 2002:a17:902:6b4b:: with SMTP id g11-v6mr34066729plt.34.1539949257474; Fri, 19 Oct 2018 04:40:57 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1539949257; cv=none; d=google.com; s=arc-20160816; b=ak524A2V1soEKS7Z1RYVq7f7eDGlGOrn/xb/zk0kA6iCq9rkfeb1Fph5zX7ijcfT27 ezAo9xjXdnLqDPMBpYJhD8x8bFQjvG5CFq3Lt0ubT+JIC0WXcFwi+P7DS8NyXNiyKcFM B7m6wqwoe4QiKk6QfDkV2l8mHn67a7V9y/UKN3MeEHN1/oibkkHl809/E2nDC/Zy2sBN ZHq+09xtuWe0uWM41hJ4cR2qghV04LNhvfo9yqxgiPmuEX1LgMubnyrCLamPvZBxbXge xyrzNeYlZXX7voxgFKhoF+shQMTsqcdKP/Hwn7FV8afQnV929QXXdFojBqBHAiwJm5h9 mojw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:openpgp:from:references:cc:to:subject:dkim-signature; bh=xkqo3xOkUHi5c+96JlZ6DBBUnio1mGFzS0Gdk+TPLKQ=; b=MZporlvAcdkS8PC7OZVtgmH6sdsueena5qeOKhakpV5OPXWW2XeDLDzniW6nNJQzZN T3ZTUbgzQPg9jWcvCalQNE8oQ9gAPd+eyN6sf3EIogrZVCkb91xzdzyXfXONPWN5j49C oqH5z4XQ5HLiLULx8Al3XA5H3OgyheLeYNsg2zZ2likl0ELBE6ZvxwY2Dk+HAwFWt9RC MRtHeUYRHeSF0I2IDdCRjiIRIKzD6uZ51+WCgHn/WB6z7lh7sKzxy7EUU+KFxzCpifMp gaQB0nb1Lu79L4XPHZHyH5DJADpblfWlxxBP6163FuQtfVJ8nDdblOBx1Hy9ee0N3UQ4 shIg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@amazon.de header.s=amazon201209 header.b=VyndSkYU; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.de Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id u5-v6si24153956pgm.268.2018.10.19.04.40.40; Fri, 19 Oct 2018 04:40:57 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@amazon.de header.s=amazon201209 header.b=VyndSkYU; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.de Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727365AbeJSTp4 (ORCPT + 99 others); Fri, 19 Oct 2018 15:45:56 -0400 Received: from smtp-fw-9101.amazon.com ([207.171.184.25]:36138 "EHLO smtp-fw-9101.amazon.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727014AbeJSTpz (ORCPT ); Fri, 19 Oct 2018 15:45:55 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.de; i=@amazon.de; q=dns/txt; s=amazon201209; t=1539949213; x=1571485213; h=subject:to:cc:references:from:message-id:date: mime-version:in-reply-to:content-transfer-encoding; bh=xkqo3xOkUHi5c+96JlZ6DBBUnio1mGFzS0Gdk+TPLKQ=; b=VyndSkYU/GmxGC4mbI0g5TFwCtkqgxseDGxnwIMMvYFnX8Xi5R+kLxqg za5mySvnvabhS++or4XKmzQAuM0WMG0ocPHaRIhiPFvgwpCdEAVZoHWvT igo4137sCbB32/lkrS+OgzY44blreOzrtNANpxljavWxGHesVc1to/Vve Q=; X-IronPort-AV: E=Sophos;i="5.54,399,1534809600"; d="scan'208";a="764999898" Received: from sea3-co-svc-lb6-vlan3.sea.amazon.com (HELO email-inbound-relay-2b-4e24fd92.us-west-2.amazon.com) ([10.47.22.38]) by smtp-border-fw-out-9101.sea19.amazon.com with ESMTP/TLS/DHE-RSA-AES256-SHA; 19 Oct 2018 11:40:10 +0000 Received: from u7588a65da6b65f.ant.amazon.com (pdx2-ws-svc-lb17-vlan3.amazon.com [10.247.140.70]) by email-inbound-relay-2b-4e24fd92.us-west-2.amazon.com (8.14.7/8.14.7) with ESMTP id w9JBe6I2074201 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=FAIL); Fri, 19 Oct 2018 11:40:08 GMT Received: from u7588a65da6b65f.ant.amazon.com (localhost [127.0.0.1]) by u7588a65da6b65f.ant.amazon.com (8.15.2/8.15.2/Debian-3) with ESMTP id w9JBe3me001875; Fri, 19 Oct 2018 13:40:04 +0200 Subject: Re: [RFC 00/60] Coscheduling for Linux To: Frederic Weisbecker Cc: Ingo Molnar , Peter Zijlstra , linux-kernel@vger.kernel.org, Rik van Riel , Subhra Mazumdar References: <20180907214047.26914-1-jschoenh@amazon.de> <20181017020933.GC24723@lerouge> From: =?UTF-8?Q?Jan_H=2e_Sch=c3=b6nherr?= Openpgp: preference=signencrypt Message-ID: Date: Fri, 19 Oct 2018 13:40:03 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.2.1 MIME-Version: 1.0 In-Reply-To: <20181017020933.GC24723@lerouge> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 17/10/2018 04.09, Frederic Weisbecker wrote: > On Fri, Sep 07, 2018 at 11:39:47PM +0200, Jan H. Schönherr wrote: >> C) How does it work? >> -------------------- [...] >> For each task-group, the user can select at which level it should be >> scheduled. If you set "cpu.scheduled" to "1", coscheduling will typically >> happen at core-level on systems with SMT. That is, if one SMT sibling >> executes a task from this task group, the other sibling will do so, too. If >> no task is available, the SMT sibling will be idle. With "cpu.scheduled" >> set to "2" this is extended to the next level, which is typically a whole >> socket on many systems. And so on. If you feel, that this does not provide >> enough flexibility, you can specify "cosched_split_domains" on the kernel >> command line to create more fine-grained scheduling domains for your >> system. > > Have you considered using cpuset to specify the set of CPUs inside which > you want to coschedule task groups in? Perhaps that would be more flexible > and intuitive to control than this cpu.scheduled value. Yes, I did consider cpusets. Though, there are two dimensions to it: a) at what fraction of the system tasks shall be coscheduled, and b) where these tasks shall execute within the system. cpusets would be the obvious answer to the "where". However, in the current form they are too inflexible with too much overhead. Suppose, you want to coschedule two tasks on SMT siblings of a core. You would be able to restrict the tasks to a specific core with a cpuset. But then, it is bound to that core, and the load balancer cannot move the group of two tasks to a different core. Now, it would be possible to "invent" relocatable cpusets to address that issue ("I want affinity restricted to a core, I don't care which"), but then, the current way how cpuset affinity is enforced doesn't scale for making use of it from within the balancer. (The upcoming load balancing portion of the coscheduler currently uses a file similar to cpu.scheduled to restrict affinity to a load-balancer-controlled subset of the system.) Using cpusets as the mean to describe which parts of the system are to be coscheduled *may* be possible. But if so, it's a long way out. The current implementation uses scheduling domains for this, because (a) most coscheduling use cases require an alignment to the topology, and (b) it integrates really nicely with the load balancer. AFAIK, there is already some interaction between cpusets and scheduling domains. But it is supposed to be rather static and as soon as you have overlapping cpusets, you end up with the default scheduling domains. If we were able to make the scheduling domains more dynamic than they are today, we might be able to couple that to cpusets (or some similar interface to *define* scheduling domains). Regards Jan