Received: by 2002:ac0:a582:0:0:0:0:0 with SMTP id m2-v6csp3265297imm; Fri, 19 Oct 2018 07:53:16 -0700 (PDT) X-Google-Smtp-Source: ACcGV61sOTptYXjneszXQYI2KmM8aNoEGDVnoogcGqhEpf/KRBKYReer2/q+IMcdgLratHqnlP3t X-Received: by 2002:a63:3589:: with SMTP id c131-v6mr9771124pga.158.1539960796109; Fri, 19 Oct 2018 07:53:16 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1539960796; cv=none; d=google.com; s=arc-20160816; b=wecYOXQEwbB90l7jgi4EmBsQobBn3B5nB3jgCIUYefOLeiy4XQW6DV3zBh1XcMXvBT khcGr9AqBOEdxzRCy3DpM6+phAgB+554S2gzkj/GczUboHWBqnxf+DMopcI07ZiIwz0C St2KDfS7KmuqFT0Sd6uSGlpmWAOntc1mBp7RCOwF/BSTB4hP9Ola5UVoszB9Pvy0UgPT lSdUvsL6cW4+XL/D7nJwD4Q3juCvwkJcQW4dPVzP6uyU42qc5rlloHghE+No7tQtALsj zN8GUJyrDpzwsH7KyRFIWZVBTGZ1xdYVqcaSBVQZupDfQHYRGzZEYP5W+Xnro9C/Jyd4 yb0w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-transfer-encoding:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=V+uEuj8eFVBwmmizIUOtMv2w1o/Zr/m8dxaCDu+hJVo=; b=OnJaYfTRVDnKoue9UEBq4Iu42SmNs28m9y6Ca1+iuEo09n+R9fGi9Ypp0FKlnABZ/c wHgjFecnE6XGWhkQjk4jVqQySj4Xnh2GFtvWOqN2NVKOWRpJfp4eChX4jIzxPqnr2a6o McTh52e8v5Rc5Glh3rIc/MXRjho303BJnM6mvILGqCA6irzeFAny26fuQDbNMSF+N6v3 dtlkDNH25c2KY4ozUTc0Uf4PRINcEXVe+c3YK83JEylf2q1uzb8T6qqID1I0k+LkHWcY hmbsahH9H4myeFGNrUdz3A/KI/zne0GDYm3iNhK+nF5uZWtXEGTTJt+5/6NnQJYWt+tS 9lPg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b="dojz3/cJ"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id r15-v6si24550658pgh.88.2018.10.19.07.53.00; Fri, 19 Oct 2018 07:53:16 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b="dojz3/cJ"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727549AbeJSW6z (ORCPT + 99 others); Fri, 19 Oct 2018 18:58:55 -0400 Received: from mail.kernel.org ([198.145.29.99]:42694 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726664AbeJSW6z (ORCPT ); Fri, 19 Oct 2018 18:58:55 -0400 Received: from localhost (lfbn-ncy-1-241-207.w83-194.abo.wanadoo.fr [83.194.85.207]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 39B9D2064A; Fri, 19 Oct 2018 14:52:27 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1539960747; bh=rhbTZYZ3tUsux5xF4bPnH1rkACZPodNKgZ1QCyF2ZSo=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=dojz3/cJWS3y9Oumcr1MPH4LeJTBe8Ktb8kk93BgEM66AluqpwimHE+AzPbdnVSRP rgCghQpAvcZklrur+FefSB+Sh+hHEzviZ85xvfL0ldoz7o4uRkHZ8zhpDR3Y9kfQ8k 4QA6ABZGbwHk5yTTX8UpFyK9Z/QZoPSok+6DHHbY= Date: Fri, 19 Oct 2018 16:52:24 +0200 From: Frederic Weisbecker To: Jan =?iso-8859-1?Q?H=2E_Sch=F6nherr?= Cc: Ingo Molnar , Peter Zijlstra , linux-kernel@vger.kernel.org, Rik van Riel , Subhra Mazumdar Subject: Re: [RFC 00/60] Coscheduling for Linux Message-ID: <20181019145223.GA15416@lerouge> References: <20180907214047.26914-1-jschoenh@amazon.de> <20181017020933.GC24723@lerouge> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Oct 19, 2018 at 01:40:03PM +0200, Jan H. Sch?nherr wrote: > On 17/10/2018 04.09, Frederic Weisbecker wrote: > > On Fri, Sep 07, 2018 at 11:39:47PM +0200, Jan H. Sch?nherr wrote: > >> C) How does it work? > >> -------------------- > [...] > >> For each task-group, the user can select at which level it should be > >> scheduled. If you set "cpu.scheduled" to "1", coscheduling will typically > >> happen at core-level on systems with SMT. That is, if one SMT sibling > >> executes a task from this task group, the other sibling will do so, too. If > >> no task is available, the SMT sibling will be idle. With "cpu.scheduled" > >> set to "2" this is extended to the next level, which is typically a whole > >> socket on many systems. And so on. If you feel, that this does not provide > >> enough flexibility, you can specify "cosched_split_domains" on the kernel > >> command line to create more fine-grained scheduling domains for your > >> system. > > > > Have you considered using cpuset to specify the set of CPUs inside which > > you want to coschedule task groups in? Perhaps that would be more flexible > > and intuitive to control than this cpu.scheduled value. > > Yes, I did consider cpusets. Though, there are two dimensions to it: > a) at what fraction of the system tasks shall be coscheduled, and > b) where these tasks shall execute within the system. > > cpusets would be the obvious answer to the "where". However, in the current > form they are too inflexible with too much overhead. Suppose, you want to > coschedule two tasks on SMT siblings of a core. You would be able to > restrict the tasks to a specific core with a cpuset. But then, it is bound > to that core, and the load balancer cannot move the group of two tasks to a > different core. > > Now, it would be possible to "invent" relocatable cpusets to address that > issue ("I want affinity restricted to a core, I don't care which"), but > then, the current way how cpuset affinity is enforced doesn't scale for > making use of it from within the balancer. (The upcoming load balancing > portion of the coscheduler currently uses a file similar to cpu.scheduled > to restrict affinity to a load-balancer-controlled subset of the system.) Oh ok, I understand now. Affinity and node-scope mutual exclusion are entirely decoupled, I see. > > > Using cpusets as the mean to describe which parts of the system are to be > coscheduled *may* be possible. But if so, it's a long way out. The current > implementation uses scheduling domains for this, because (a) most > coscheduling use cases require an alignment to the topology, and (b) it > integrates really nicely with the load balancer. So what is the need for cosched_split_domains? What kind of corner case won't fit into scheduler domains? Can you perhaps spare that part in this patchset to simplify it somehow? If it happens to be necessary, it can still be added iteratively. Thanks.