Received: by 2002:a05:6358:53a8:b0:117:f937:c515 with SMTP id z40csp4612372rwe; Mon, 17 Apr 2023 15:50:19 -0700 (PDT) X-Google-Smtp-Source: AKy350ZU7NwmwLt9ZMiXiP3eTcpG0HVXCpjEJO6g2d7CgI/iIcuFL9rB1ZVvfUzRuXSib1bihA3u X-Received: by 2002:a17:90b:170e:b0:246:aeee:e61c with SMTP id ko14-20020a17090b170e00b00246aeeee61cmr81798pjb.11.1681771818947; Mon, 17 Apr 2023 15:50:18 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1681771818; cv=none; d=google.com; s=arc-20160816; b=RHxzasCDVTBuUP8OTlqLe0xhdTJ6yaNqKGzl0TLyhLjhu8zdrz5ZEQPB1XeYnvfddl wStmNTeHJNQcuTjXmU3A54ObEByJUWr7hK2QzUGHyJq5c9dlkc6HE3u9zxnyMpBI6LDa RtiiKH2hCG6oM4ipclFrS5eDzxmB5ZF3ArTN9iS4QGBOFR0acGBNbK77nQhWu4yeqzBi yU+qYgtQq0KHjLvASuNEH/sIt/Eg4S37q4Q87XsyXqHLSwlf5xEaVkF5JGjcuOBLmYOI Icn8WL/P5D30ki3kfYzRJAp8lRfiukItp9ix1CivukJ03RcKYHjl8nMkriaEhyPOWrVG qB3w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :dkim-signature; bh=R9mGtAgXnjU/KAX5mdUQwNYWlhZOWkRVM73ycLz4uPw=; b=N0D6vM6c1P6IAOUK2NWha4ludpOcKRDcRZjQI1WH8xbQ0cuXsDnJ5l36rZP9yZpAQN QrZGCaFIeyw1m0CSas6KtJTJhzmkX2jorx5spBfSdgVdTw4a1CdiKpoo9svy3OWsAkv7 tqhuIh3TrSiGFDZn547BIKmNCbT6H2Qeq3HfCUtQ1XPmWnEpl7e2ltVHfx/5GrP8QSoR C+xOR4fZt1RnWXUlyj8TNTRhSvYdeWzDJsNqIle1LcesditLng8rP1fzcyTQWuM7yMVX 6CaEk29a0czo9J63z50qkxj+IWaPLXKgACa0gyXPLmO1xskSYNHTIcK8Qw/s801S1aCR 9c7A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b="Kh/VfIu9"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id t128-20020a635f86000000b0050f8790a843si13595254pgb.189.2023.04.17.15.50.07; Mon, 17 Apr 2023 15:50:18 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b="Kh/VfIu9"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229652AbjDQWtw (ORCPT + 99 others); Mon, 17 Apr 2023 18:49:52 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36614 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229479AbjDQWtv (ORCPT ); Mon, 17 Apr 2023 18:49:51 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D23721BF6; Mon, 17 Apr 2023 15:49:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1681771789; x=1713307789; h=date:from:to:cc:subject:message-id:references: mime-version:in-reply-to; bh=yEfjoJ9tK27gThlmren9CZn/INA0Jiv3W+xlgj/ZAF4=; b=Kh/VfIu9N5kJA0BVVnNBGCKd0cEKUxwhtyujiow+fLYuMAq8KBkJFqAr corQOEzyrUXZ7NrSdnQpWjzUOTx7qioko2rmwiYHB5BZ3J3QyztCXS+aJ LskZMpx/nKv/SAdpkt3fsVfHsFx6y9UYg15+RX6qwzxxOPeL5z390yUZk XuhxC0JpNYXwbaYu1FTaT/VjwZYjXg5tNJU2lK+FtIoVZFzxXqqQoMtES qRgeHmZnpylaIaspbvFU9arRUwDqjqyI168uW2HhvGBpaQGf9kSA+Q+eb qDQ0YvGBTG7kh9bslgOQ1skAWmyOX+ByMAzi1+abc6t8YT7hONmp1ftd8 Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10683"; a="347772792" X-IronPort-AV: E=Sophos;i="5.99,205,1677571200"; d="scan'208";a="347772792" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 17 Apr 2023 15:49:48 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10683"; a="721294982" X-IronPort-AV: E=Sophos;i="5.99,205,1677571200"; d="scan'208";a="721294982" Received: from ranerica-svr.sc.intel.com ([172.25.110.23]) by orsmga008.jf.intel.com with ESMTP; 17 Apr 2023 15:49:48 -0700 Date: Mon, 17 Apr 2023 15:52:52 -0700 From: Ricardo Neri To: Vincent Guittot Cc: "Peter Zijlstra (Intel)" , Juri Lelli , Ricardo Neri , "Ravi V. Shankar" , Ben Segall , Daniel Bristot de Oliveira , Dietmar Eggemann , Len Brown , Mel Gorman , "Rafael J. Wysocki" , Srinivas Pandruvada , Steven Rostedt , Tim Chen , Valentin Schneider , Lukasz Luba , Ionela Voinescu , x86@kernel.org, "Joel Fernandes (Google)" , linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, "Tim C . Chen" Subject: Re: [PATCH v3 07/24] sched/fair: Compute IPC class scores for load balancing Message-ID: <20230417225252.GA6156@ranerica-svr.sc.intel.com> References: <20230207051105.11575-1-ricardo.neri-calderon@linux.intel.com> <20230207051105.11575-8-ricardo.neri-calderon@linux.intel.com> <20230330020724.GA26315@ranerica-svr.sc.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.9.4 (2018-02-28) X-Spam-Status: No, score=-4.3 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Mar 31, 2023 at 02:20:11PM +0200, Vincent Guittot wrote: > On Thu, 30 Mar 2023 at 03:56, Ricardo Neri > wrote: > > > > On Tue, Mar 28, 2023 at 12:00:58PM +0200, Vincent Guittot wrote: > > > On Tue, 7 Feb 2023 at 06:01, Ricardo Neri > > > wrote: > > > > > > > > Compute the joint total (both current and prospective) IPC class score of > > > > a scheduling group and the local scheduling group. > > > > > > > > These IPCC statistics are used during idle load balancing. The candidate > > > > scheduling group will have one fewer busy CPU after load balancing. This > > > > observation is important for cores with SMT support. > > > > > > > > The IPCC score of scheduling groups composed of SMT siblings needs to > > > > consider that the siblings share CPU resources. When computing the total > > > > IPCC score of the scheduling group, divide score of each sibling by the > > > > number of busy siblings. > > > > > > > > Collect IPCC statistics for asym_packing and fully_busy scheduling groups. > > > > > > IPCC statistics collect scores of current tasks, so they are > > > meaningful only when trying to migrate one of those running tasks. > > > Using such score when pulling other tasks is just meaningless. And I > > > don't see how you ensure such correct use of ipcc score > > > > Thank you very much for your feedback Vincent! > > > > It is true that the task that is current when collecting statistics may be > > different from the task that is current when we are ready to pluck tasks. > > > > Using IPCC scores for load balancing benefits large, long-running tasks > > the most. For these tasks, the current task is likely to remain the same > > at the two mentioned points in time. > > My point was mainly about the fact that the current running task is > the last one to be pulled. And this happens only when no other task > was pulled otherwise. (Thanks again for your feedback, Vincent. I am sorry for the late reply. I needed some more time to think about it.) Good point! It is smarter to compare and pull from the back of the queue, rather than comparing curr and pulling from the back. We are more likely to break the tie correctly without being too complex. Here is an incremental patch with the update. I'll include this change in my next version. @@ -9281,24 +9281,42 @@ static void init_rq_ipcc_stats(struct sg_lb_stats *sgs) sgs->min_score = ULONG_MAX; } +static int rq_last_task_ipcc(int dst_cpu, struct rq *rq, unsigned short *ipcc) +{ + struct list_head *tasks = &rq->cfs_tasks; + struct task_struct *p; + struct rq_flags rf; + int ret = -EINVAL; + + rq_lock_irqsave(rq, &rf); + if (list_empty(tasks)) + goto out; + + p = list_last_entry(tasks, struct task_struct, se.group_node); + if (p->flags & PF_EXITING || is_idle_task(p) || + !cpumask_test_cpu(dst_cpu, p->cpus_ptr)) + goto out; + + ret = 0; + *ipcc = p->ipcc; +out: + rq_unlock(rq, &rf); + return ret; +} + /* Called only if cpu_of(@rq) is not idle and has tasks running. */ static void update_sg_lb_ipcc_stats(int dst_cpu, struct sg_lb_stats *sgs, struct rq *rq) { - struct task_struct *curr; unsigned short ipcc; unsigned long score; if (!sched_ipcc_enabled()) return; - curr = rcu_dereference(rq->curr); - if (!curr || (curr->flags & PF_EXITING) || is_idle_task(curr) || - task_is_realtime(curr) || - !cpumask_test_cpu(dst_cpu, curr->cpus_ptr)) + if (rq_last_task_ipcc(dst_cpu, rq, &ipcc)) return; - ipcc = curr->ipcc; score = arch_get_ipcc_score(ipcc, cpu_of(rq)); > > > > > My patchset proposes to use IPCC clases to break ties between otherwise > > identical sched groups in update_sd_pick_busiest(). Its use is limited to > > asym_packing and fully_busy types. For these types, it is likely that there > > will not be tasks wanting to run other than current. need_active_balance() > > will return true and we will migrate the current task. > > I disagree with your assumption above, asym_packing and fully_busy > types doesn't put any mean on the number of running tasks Agreed. What I stated was not correct. o> > > > > You are correct, by only looking at the current tasks we risk overlooking > > other tasks in the queue and the statistics becoming meaningless. A fully > > correct solution would need to keep track of the the types of tasks in > > all runqueues as they come and go. IMO, the increased complexity of such > > approach does not justify the benefit. We give the load balancer extra > > information to decide between otherwise identical sched groups using the > > IPCC statistics of big tasks. > > because IPCC are meaningful only when there is only 1 running task and > during active migration, you should collect them only for such > situation I think that if we compute the IPCC statistics using the tasks at the back of the runqueue, then IPCC statistics remain meaningful for nr_running >= 1.