Received: by 2002:ac0:a594:0:0:0:0:0 with SMTP id m20-v6csp3441430imm; Fri, 25 May 2018 05:54:31 -0700 (PDT) X-Google-Smtp-Source: AB8JxZpjfwmSCuLmXVbHHp6GIExsYavEBI2bL9OQ71XHpXXh3eUZelq5itWMDuRsZnIoXuAnwI3E X-Received: by 2002:a17:902:59ce:: with SMTP id d14-v6mr2090711plj.253.1527252871364; Fri, 25 May 2018 05:54:31 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1527252871; cv=none; d=google.com; s=arc-20160816; b=A3VEr5M/Q7ZxaOgD9rgMCC/MUbDj9G6X04ZKVpVvqnqG60E9ApEnwvWWyFBFmGfPa3 tQtK8eUs1vYFMcj+bARkCGuiONSx+O+rUgL4M/gXcK1uy8M4h94D/czazLtUjVq6vlxa WDnaTSNQ3ueh4a0pCCELcrldm58vv7XDyVOR34DT4WtSf8BWmC3d4WahxrP9j/KKG5nl /gYQSSoMiob/sZ4E37Inv+WJYuF64yRFvZlwiGwZr/IEJnaxIhjcxVwRKou1hOd1RrAl Xg5HcQj0eYU0q9FeNnZ1VINZGN4DUwhH2D+iZo3VdbNJI9Vgl+4nH7M6ok/Vo+JGEj5O CN/w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:arc-authentication-results; bh=x2OaAnR9vFi78ajLWSn0SvP1a901iNOVGov4vZp7PqM=; b=MdAIEW5GFOW1gg4Ezd4PuCv5QFTwsHMD2b5TB63n3B8c/V31tBwAU0KxXXLRsIcoBj Ro3c7oS02BvJ+r1SeKpDyAukrfNYkh3UJPeC13k/8h8CQkSfoRaAzf8rCyZxaOSQoepl bxqij5y8zWQj5Til6CdVTUayo4ozQ31V+WF71930DsEwCXG4EQJ/q1wyCBURKNOLA8TF YPRaKQ4Lx9qqGljzUhCmRqXCT1nkh2NENLbTSoGJq5z+E4noN+XOullPAp8BuU+Tli1u 4bSoD2+4ZVc6rGaGgCkRTJ/h9Jvbdl2LavwA43SXYbqmMp6CU0rSmdApeN4t4Z/qkevM 2OFw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id e3-v6si23773166pfc.336.2018.05.25.05.54.16; Fri, 25 May 2018 05:54:31 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S936092AbeEYMwY (ORCPT + 99 others); Fri, 25 May 2018 08:52:24 -0400 Received: from mail-wr0-f193.google.com ([209.85.128.193]:36910 "EHLO mail-wr0-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S935358AbeEYMwV (ORCPT ); Fri, 25 May 2018 08:52:21 -0400 Received: by mail-wr0-f193.google.com with SMTP id i12-v6so9077452wrc.4 for ; Fri, 25 May 2018 05:52:20 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=x2OaAnR9vFi78ajLWSn0SvP1a901iNOVGov4vZp7PqM=; b=hkim4OALkjrZtxBcn+qUBRxyNBv7WUzz2oeNLLgaQI0RQvTaG08DPc8q08qutKUT5Y nbOMR0hO3kSDJnPjmdMTEXnmhS7q0vCpkeIOhKeGWmo+Xk/StWVCWSoGW5PXOcGTAEIP MSGDWKQrWRe8RVAFVP6n3l/37MemO32CW4cPSjtWVp/d6Nc+mrWnhUyvJC/0C6duZ3xC B0OF5taqJUBN7h/Hp+K/tRaCGICoA/MvYeVtE+OlP1UmfFA8a+Bj0o2Pju8ys8XfNm1t 9dtPOOU8BgDMqYv8Qr6ySvcGOGhPO5wYkoOxGNiosKH0TVkBECo+A68PDuoyZ4igp0WS B64A== X-Gm-Message-State: ALKqPweLw0wsopsDOaYIPKQVlxUq0xkgWXtNGyI29StzhYxgqmVdtZu9 JWMjLmAbxRkXgFK5WEIdarBElw== X-Received: by 2002:adf:80d0:: with SMTP id 74-v6mr2289792wrl.273.1527252739988; Fri, 25 May 2018 05:52:19 -0700 (PDT) Received: from localhost.localdomain ([151.15.207.242]) by smtp.gmail.com with ESMTPSA id d83-v6sm10188627wmh.16.2018.05.25.05.52.18 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Fri, 25 May 2018 05:52:19 -0700 (PDT) Date: Fri, 25 May 2018 14:52:17 +0200 From: Juri Lelli To: Patrick Bellasi Cc: Waiman Long , Tejun Heo , Li Zefan , Johannes Weiner , Peter Zijlstra , Ingo Molnar , cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, kernel-team@fb.com, pjt@google.com, luto@amacapital.net, Mike Galbraith , torvalds@linux-foundation.org, Roman Gushchin Subject: Re: [PATCH v8 4/6] cpuset: Make generate_sched_domains() recognize isolated_cpus Message-ID: <20180525125217.GC678@localhost.localdomain> References: <1526590545-3350-1-git-send-email-longman@redhat.com> <1526590545-3350-5-git-send-email-longman@redhat.com> <20180523173453.GY30654@e110439-lin> <20180524090430.GZ30654@e110439-lin> <20180524103938.GB3948@localhost.localdomain> <20180525103147.GC30654@e110439-lin> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180525103147.GC30654@e110439-lin> User-Agent: Mutt/1.9.2 (2017-12-15) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 25/05/18 11:31, Patrick Bellasi wrote: [...] > Right, so the problem seems to be that we "need" to call > arch_update_cpu_topology() and we do that by calling > partition_sched_domains() which was initially introduced by: > > 029190c515f1 ("cpuset sched_load_balance flag") > > back in 2007, where it's also quite well explained the reasons behind > the sched_load_balance flag and the idea to have "partitioned" SDs. > > I also (hopefully) understood that there are at least two actors involved: > > - A) arch code > which creates SDs and SGs, usually to group CPUs depending on the > memory hierarchy, to support different time granularity of load > balancing operations > > Special case here are HP and hibernation which, by on-/off-lining > CPUs they directly affect the SDs/SGs definitions. > > - B) cpusets > which expose to userspace the possibility to define, > _if possible_, a finer granularity set of SGs to further restrict the > scope of load balancing operations > > Since B is a "possible finer granularity" refinement of A, then we > trigger A's reconfigurations based on B's constraints. > > That's why, for example, in consequence of an HP online event, > we have: > > --- core.c ------------------- > HP[sched:active] > | sched_cpu_activate() > | cpuset_cpu_active() > --- cpuset.c ----------------- > | cpuset_update_active_cpus() > | schedule_work(&cpuset_hotplug_work) > \.. System Kworker \ > | cpuset_hotplug_workfn() > if (cpus_updated || force_rebuild) > | rebuild_sched_domains() > | rebuild_sched_domains_locked() > | generate_sched_domains() > --- topology.c --------------- > | partition_sched_domains() > | arch_update_cpu_topology() > > > IOW, we need to pass via cpusets to rebuild the SDs whenever we > there are HP events or we "need" to do an arch_update_cpu_topology() > via the arch topology driver (drivers/base/arch_topology.c). I don't think the arch topology driver is always involved in this (e.g., arch/x86/kernel/itmt::sched_itmt_update_handler()). Still we need to check if topology changed, as you say. > This last bit is also interesting, whenever we detect arch topology > information that required an SD rebuild, we need to force a > partition_sched_domains(). But, for that, in: > > commit 50e76632339d ("sched/cpuset/pm: Fix cpuset vs. suspend-resume bugs") > > we just introduced the support for the "force_rebuild" flag to be set. > > Thus, potentially we can just extend the check I've proposed to consider the > force rebuild flag, to be something like: > > ---8<--- > diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c > index 8f586e8bdc98..1f051fafaa3a 100644 > --- a/kernel/cgroup/cpuset.c > +++ b/kernel/cgroup/cpuset.c > @@ -874,11 +874,19 @@ static void rebuild_sched_domains_locked(void) > !cpumask_subset(top_cpuset.effective_cpus, cpu_active_mask)) > goto out; > > + /* Special case for the 99% of systems with one, full, sched domain */ > + if (!force_rebuild && > + !top_cpuset.isolation_count && > + is_sched_load_balance(&top_cpuset)) > + goto out; > + force_rebuild = false; > + > /* Generate domain masks and attrs */ > ndoms = generate_sched_domains(&doms, &attr); > > /* Have scheduler rebuild the domains */ > partition_sched_domains(ndoms, doms, attr); > out: > put_online_cpus(); > ---8<--- > > > Which would still allow to use something like: > > cpuset_force_rebuild() > rebuild_sched_domains() > > to actually rebuild SD in consequence of arch topology changes. That might work. > > > > > Maybe we could move the check you are proposing in update_cpumasks_ > > hier() ? > > Yes, that's another option... although there we are outside of > get_online_cpus(). Could be a problem? Mmm, using force_rebuild flag seems safer indeed.