Received: by 2002:a05:6358:7058:b0:131:369:b2a3 with SMTP id 24csp6817151rwp; Tue, 18 Jul 2023 06:22:11 -0700 (PDT) X-Google-Smtp-Source: APBJJlET8TCr7+U9hCbUdY3Uyvrbh7B4RV8EsgDsrYpRIWl7+mmLfIZxwwPV2xLpQ7g2MhgalxJd X-Received: by 2002:a17:907:3c1e:b0:98d:4ae:8db9 with SMTP id gh30-20020a1709073c1e00b0098d04ae8db9mr12973658ejc.19.1689686531042; Tue, 18 Jul 2023 06:22:11 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1689686531; cv=none; d=google.com; s=arc-20160816; b=S6SWFpRstSdRlNns1V2tF81VH/h8rpV9sxwMTGdyyM5RgS/g/ZX8fI/oW+/AnDoZuH eqY1a9wOo18zPS71nRal0MEUHyy9zXvYZKm5+svd6XArN6pHvrOG7BXc6RNHkpjA7NxG IFFb7mlMgqJK5FJ7vR5cCoZFNoPkr5+hJkugw26x0Bwg9WDDV9RjJmLFTFxwQBdcknxE OBOiAiDjZPszt15j9ekAkWmW9a+3U13j0/X4TbYZ4t7auXwhgVe/jZ51iQckP6Sy7EHu dprtl8bvMOyC+EdgqAHehLUMIgG/WfVh1gb8n4OrLcTwR4zH37lVSnIieyLSY+ui7gd4 fN3w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=uX+Tm1S/tnom3XGxR87d/YDaIyrI7WNCyv7dHeUgar8=; fh=ZTNQnw/BiUHnY3s5g2TfVeFrq3W0G4yivTNNJeGDDEc=; b=ZQzc1TwnItU38SVa4ZZXRsUyxuEag5FCzgPvS6pRVAHbFa1r+WMV5jMFpsfsTZAxwM a8+AufBcwnBcCbHd52Wgxw5ELc2y7BiZcHOiI/TbdH6AC4V91u/qywPOazbP+RmJ4Wsb 51vPZoM9nGcHB6LD6UvW0k6/V6CP/YbyB6wZtO9DvcKheSOeTfC2+AA+Jwnqkatn3VK6 zRmElr6afLQ+/cphHKeK1lSleXCYXikPpohUuhSMGTYXn1lNSwCa9Ag+j8h0WBiYUmRE I+TRNZ61tF4giYitfmi57Z/j3tELsjyWvrBH/9shkQ5naC4Eoir7hIQS/VsrTwHx5/Lk fxQg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=rPtLlpfC; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id cx10-20020a170906c80a00b00992fee4bc4csi1104305ejb.647.2023.07.18.06.21.47; Tue, 18 Jul 2023 06:22:11 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=rPtLlpfC; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231309AbjGRMs0 (ORCPT + 99 others); Tue, 18 Jul 2023 08:48:26 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60904 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230194AbjGRMsZ (ORCPT ); Tue, 18 Jul 2023 08:48:25 -0400 Received: from mail-wm1-x32a.google.com (mail-wm1-x32a.google.com [IPv6:2a00:1450:4864:20::32a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DC92E194 for ; Tue, 18 Jul 2023 05:48:23 -0700 (PDT) Received: by mail-wm1-x32a.google.com with SMTP id 5b1f17b1804b1-3fbab0d0b88so42373985e9.0 for ; Tue, 18 Jul 2023 05:48:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1689684502; x=1692276502; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:from:to :cc:subject:date:message-id:reply-to; bh=uX+Tm1S/tnom3XGxR87d/YDaIyrI7WNCyv7dHeUgar8=; b=rPtLlpfCY5uRfKv4WLFlgxHGk0E/X0NxAHcgtaCfIhmyjZi3CKWW+FVV2O9ab0KXxI FsW3dphdwFZsZatIr+4GufrKqr5lmzixo5CHukaefQdJJMhW47D6OuFyAXsMxxgA+9cc tyqg11FFi+lPHCuaRFXO2K2VKEZ41ApXnIGbDvyUQ6zCZA14RgYUJ0c16LLQiKQVvqog OAjKhfX8KxPGB2Kw7SkFstCWJJagSQtLZihPGbKYh3ro0vj6ArUuhOGAu7qYs2Ikf25X 71D01yIy+4env9FFQWD6OuzgVjNDUPx9kixCk59M79p0ODiMMonsO5Pc/YtVydfrvkV5 H3eQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689684502; x=1692276502; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=uX+Tm1S/tnom3XGxR87d/YDaIyrI7WNCyv7dHeUgar8=; b=UrOTe2isF90EdZlqtHEuxII0fRqUEpH2LMUkGLfLR0653HHBJfB121IqZhEiU10TZJ MF2m/xCYGRjWj1h3ToxO6DYP54M5Qb/4Xa43Yh2bC/EHjkY1hXxrPV0WAJ4BXE0oDQKF 9RVjVY/nHwg0jrt2KZSdMglroAWgN1wIBRz0czTve9yp+fjsCx30cdyulrk1j5hFCT1o ncfNTFy6O3HDMFrgScdYw5HJ4QHJav5YG/+72YkKwclqqu0sRmcx6hZqRr31dnvONVED qBLF8Rc7kwh175ra2IAd2scfaSsbrOruUWg+k0Psrz+AfL3P2ThcZpf5Vm8iQwTb3UQa ZblA== X-Gm-Message-State: ABy/qLYnELPrZJChz9diIlIAZS7Tv2fCEMAz+TQTDa9/8dWqkz/tDeUe nwzwTyLDlmb6eJm5w5pPc+mDAKnsqBt2c75bpDQ= X-Received: by 2002:a05:6000:7:b0:314:914:66cc with SMTP id h7-20020a056000000700b00314091466ccmr10612523wrx.8.1689684502388; Tue, 18 Jul 2023 05:48:22 -0700 (PDT) Received: from vingu-book ([2a01:e0a:f:6020:48e9:698d:5e54:4632]) by smtp.gmail.com with ESMTPSA id h10-20020a5d4fca000000b00314172ba213sm2269823wrw.108.2023.07.18.05.48.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 18 Jul 2023 05:48:21 -0700 (PDT) Date: Tue, 18 Jul 2023 14:48:20 +0200 From: Vincent Guittot To: Qais Yousef Cc: Ingo Molnar , Peter Zijlstra , Dietmar Eggemann , linux-kernel@vger.kernel.org Subject: Re: [RFC PATCH] sched/fair: Fix impossible migrate_util scenario in load balance Message-ID: References: <20230716014125.139577-1-qyousef@layalina.io> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20230716014125.139577-1-qyousef@layalina.io> X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Le dimanche 16 juil. 2023 ? 02:41:25 (+0100), Qais Yousef a ?crit : > We've seen cases while running geekbench that an idle little core never > pulls a task from a bigger overloaded cluster for 100s of ms and > sometimes over a second. > > It turned out that the load balance identifies this as a migrate_util > type since the local group (little cluster) has a spare capacity and > will try to pull a task. But the little cluster capacity is very small > nowadays (around 200 or less) and if two busy tasks are stuck on a mid > core which has a capacity of over 700, this means the util of each tasks > will be around 350+ range. Which is always bigger than the spare > capacity of the little group with a single idle core. > > When trying to detach_tasks() we bail out then because of the comparison > of: > > if (util > env->imbalance) > goto next; > > In calculate_imbalance() we convert a migrate_util into migrate_task > type if the CPU trying to do the pull is idle. But we only do this if > env->imbalance is 0; which I can't understand. AFAICT env->imbalance > contains the local group's spare capacity. If it is 0, this means it's > fully busy. > > Removing this condition fixes the problem, but since I can't fully > understand why it checks for 0, sending this as RFC. It could be a typo > and meant to check for > > env->imbalance != 0 > > instead? > > Signed-off-by: Qais Yousef (Google) > --- > kernel/sched/fair.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index a80a73909dc2..682d9d6a8691 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -10288,7 +10288,7 @@ static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *s > * waiting task in this overloaded busiest group. Let's > * try to pull it. > */ > - if (env->idle != CPU_NOT_IDLE && env->imbalance == 0) { > + if (env->idle != CPU_NOT_IDLE) { With this change you completely?skip migrate_util for idle and newly idle case and this would be too aggressive. We can do something similar to migrate_load in detach_tasks(): --- kernel/sched/fair.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index d3df5b1642a6..64111ac7e137 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -8834,7 +8834,13 @@ static int detach_tasks(struct lb_env *env) case migrate_util: util = task_util_est(p); - if (util > env->imbalance) + /* + * Make sure that we don't migrate too much utilization. + * Nevertheless, let relax the constraint if + * scheduler fails to find a good waiting task to + * migrate. + */ + if (shr_bound(util, env->sd->nr_balance_failed) > env->imbalance) goto next; env->imbalance -= util; -- > env->migration_type = migrate_task; > env->imbalance = 1; > } > -- > 2.25.1 >