Received: by 2002:a05:7412:d1aa:b0:fc:a2b0:25d7 with SMTP id ba42csp1669973rdb; Wed, 31 Jan 2024 05:56:06 -0800 (PST) X-Google-Smtp-Source: AGHT+IElqRsJBGfKqq6tsq0Bbcq2ghWrD8l9moIR9XDtfMt1drAvpvR2MbXu7PLs2MeI3pz7NC3w X-Received: by 2002:a25:2d1f:0:b0:db7:dcc1:d418 with SMTP id t31-20020a252d1f000000b00db7dcc1d418mr1447311ybt.15.1706709366523; Wed, 31 Jan 2024 05:56:06 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1706709366; cv=pass; d=google.com; s=arc-20160816; b=Qyr64SQb1eyz8pKJQ7dRCuDw7K1l1rPsMR7NqKAzGKsM5MNO/HniYHtU70HEZRjhFb h2sqMYYbvPSMw5JYQfw0Ks+GADFycx3MncOlbPMSKxWVzjeET1NjWrp+U8Stid/YwK5G 7HipKbwbIADIZq8HiKGUTfJVcp9rPZ5TryyVTEtUEa/tDQhcrejMlwDSAF43bl6QROEy F2IUIX/ynC4hqRAOhi6hnbNl6IztBGYyo7uLqcVpYUcKvJdkOXOlcCtRbzmgAw6a4IvG gbIp2yL9+2SW44qyzPXOLGjN8MlatlqNXGLBnEiO+0CxXxNMa4l0lheJ6tMXP87cMSVl VUIQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:list-unsubscribe:list-subscribe:list-id:precedence :dkim-signature; bh=lP/iCDT0hUnPesKrb7hWD3dW8NEUrJgALKt9dBmUD4c=; fh=f71yGfDLfdxkmXlm4AvC7lve7WpdwhyRxIqqbjGDj2o=; b=xs6aXmRkicsANM3l+KXGXmMLS7781CDIHovUUQLzSvy60wJZn4UpDcFsPn4AGTw5oC F6ocv7XFrsi3DF48pYSjYym58WKcUbfcYVZN6poqxH4qXU6J/a51oBBNOM4C8VRJlRax dg9urojdVqRw5eSp0Yv63eki7D/Tu5YB53sXVYnuWW61+DuSUa5b6CztFnoJAgm6DD+u dSj0YZFrjtlq5hx0nEjYLwssoYJBBaH7SSjsNysikB+c5tin0VNF22IDrbQweeIV58sk cPlF0Ajn0CfPql8/EtoT8RzkFP3N9nEHdez5JrhyuQt/u0EZFL7woa7oWj+YgifM2Zhd xdBQ==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=BmcX4Yg+; arc=pass (i=1 spf=pass spfdomain=linaro.org dkim=pass dkdomain=linaro.org dmarc=pass fromdomain=linaro.org); spf=pass (google.com: domain of linux-kernel+bounces-46540-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-46540-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org X-Forwarded-Encrypted: i=1; AJvYcCVqYZZLEOVQJ9icwVV2UfcQxHUe8gz0RJqEJKnm9Iq7p0FbRhONpeHJ436+qAP98cVu5GPrqk79nSrFZrPYMaPiUtRMn/sJKlMYAFClSg== Return-Path: Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [147.75.199.223]) by mx.google.com with ESMTPS id 8-20020a056214202800b0068538a3d9e3si9389063qvf.212.2024.01.31.05.56.06 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 31 Jan 2024 05:56:06 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-46540-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) client-ip=147.75.199.223; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=BmcX4Yg+; arc=pass (i=1 spf=pass spfdomain=linaro.org dkim=pass dkdomain=linaro.org dmarc=pass fromdomain=linaro.org); spf=pass (google.com: domain of linux-kernel+bounces-46540-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-46540-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id 1B3AD1C22498 for ; Wed, 31 Jan 2024 13:55:52 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 7135280C06; Wed, 31 Jan 2024 13:55:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linaro.org header.i=@linaro.org header.b="BmcX4Yg+" Received: from mail-pg1-f175.google.com (mail-pg1-f175.google.com [209.85.215.175]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F2C597F491 for ; Wed, 31 Jan 2024 13:55:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.175 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706709344; cv=none; b=VSmmZzZtJnsA0h54+kggy7qJDPbHnvyqUSO0nlm4MHbX2lQZcdK6HQHHVwZcHsykyKb+Fj8qMjcHcIuE2czMBioJs3kB9T7HlnZZ49I6DoIaTs74NmwWKmz7hV2fP1CLreKSx4RvtcqEtzRmwCfk0C3/6N5m4Bv6/ddXZrumecM= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706709344; c=relaxed/simple; bh=lP/iCDT0hUnPesKrb7hWD3dW8NEUrJgALKt9dBmUD4c=; h=MIME-Version:References:In-Reply-To:From:Date:Message-ID:Subject: To:Cc:Content-Type; b=De6R/T9vDU5YUCQFPaZymwvytM+3nysLLOANLszq5oTEo+STk+1JZeabBDKYLYTbXhJpNmWpNHbwzXake7IAbG+ts5XZPUAyDyMpdWjEFu1doXut3VktI4xPiGxJZvF+XXN3+FX7JAdibvaO0eDUb5cy96wFzp5v2DuTiXK/AhM= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linaro.org; spf=pass smtp.mailfrom=linaro.org; dkim=pass (2048-bit key) header.d=linaro.org header.i=@linaro.org header.b=BmcX4Yg+; arc=none smtp.client-ip=209.85.215.175 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linaro.org Received: by mail-pg1-f175.google.com with SMTP id 41be03b00d2f7-5cfd95130c6so3112151a12.1 for ; Wed, 31 Jan 2024 05:55:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1706709342; x=1707314142; darn=vger.kernel.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=lP/iCDT0hUnPesKrb7hWD3dW8NEUrJgALKt9dBmUD4c=; b=BmcX4Yg+C7DnGCekgTcpfKgtpNaYG3sMswR5oyrzm3Yej5sFUxLENwbqQ2QQceDZC8 KohnmWwvBr4jPzl8WOUyfoa5V2th3hf8YEoocKXLILmIw+UfHozjm3NEZ8ZCYfW/mQXI 2/u2HbOURlIgTjgd8+rNAa63aVDMF9eraA0J7kvoXXpBVJi4H6sNhp6rtk6vM2oxuups vXHsnA38hDBVupLTcNjfgpRj8FwOZj8j958bSf9d2MNvEDOiG2zazmlaqc9kET4D+8/t FDyepQ/jGaBEoxNrPLFoe36++u1x3rLqZ7/Gd6IgynYPv5rE1sjNlBpNmDgRym1WMGt6 fEZA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1706709342; x=1707314142; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=lP/iCDT0hUnPesKrb7hWD3dW8NEUrJgALKt9dBmUD4c=; b=Cxro5qqZH1DhvfW3J7knIgNV8oUU2DnxJkqN0iAYH0tKDcfd6Xke2B4qDULb2olu1R Og2fJ9Tj4Hgp63gbSe2gT88FU0maFNkSOpitXBb2sBavoC4gNhTfaEjDRMx5nD2jN3s9 3IFmVMqVSvz8dZ3CcHeR0zSzOFs64Oba1jARyt8nqt933yuwFTJn748dEZ6qVyFwTkVW tEicKttiH3QXu21XqSb2H6C4rasMaw0KXzxDwUv6ZAaOdg1o5ia9c9WXBwtAwWuE0+JX 6uomDj1rb2kzjWA1owsbDbuhrWW1HYuvqKoAo3iRauzMCwOX+QQiPOsFvcJkXpt8YHz/ +rCQ== X-Gm-Message-State: AOJu0YwEquwNGfcE6bGrn9XxqCtRAkko3mFjru/wafRg/a7+Erip1VeY dJnmA1ffdiChl3BHl57lkjiVBuXi2kxQi/s1ArI4W8FlUFZAtNRXmgBKECZjI0UnDWuNcvRjGfG WpNgdZZOmrdvzXD9/OVRCdxbeWv4gJRVNN1oe3akV2aNvdSpDGuw= X-Received: by 2002:a17:90b:e8a:b0:293:f625:1b0b with SMTP id fv10-20020a17090b0e8a00b00293f6251b0bmr1531667pjb.5.1706709342129; Wed, 31 Jan 2024 05:55:42 -0800 (PST) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 References: <20240105222014.1025040-1-qyousef@layalina.io> <20240105222014.1025040-2-qyousef@layalina.io> <20240124222959.ikwnbxkcjaxuiqp2@airbuntu> <20240126014602.wdcro3ajffpna4fp@airbuntu> <20240128235005.txztdbdq2obyi4n6@airbuntu> <20240130235727.wj3texzo4lpbba6b@airbuntu> In-Reply-To: <20240130235727.wj3texzo4lpbba6b@airbuntu> From: Vincent Guittot Date: Wed, 31 Jan 2024 14:55:31 +0100 Message-ID: Subject: Re: [PATCH v4 1/2] sched/fair: Check a task has a fitting cpu when updating misfit To: Qais Yousef Cc: Ingo Molnar , Peter Zijlstra , Dietmar Eggemann , linux-kernel@vger.kernel.org, Pierre Gondois Content-Type: text/plain; charset="UTF-8" On Wed, 31 Jan 2024 at 00:57, Qais Yousef wrote: > > On 01/30/24 10:41, Vincent Guittot wrote: > > On Mon, 29 Jan 2024 at 00:50, Qais Yousef wrote: > > > > > > On 01/26/24 15:08, Vincent Guittot wrote: > > > > > > > > TBH I had a bit of confirmation bias that this is a problem based on the fix > > > > > (0ae78eec8aa6) that we had in the past. So on verification I looked at > > > > > balance_interval and this reproducer which is a not the same as the original > > > > > one and it might be exposing another problem and I didn't think twice about it. > > > > > > > > I checked the behavior more deeply and I confirm that I don't see > > > > improvement for the use case described above. I would say that it's > > > > even worse as I can see some runs where the task stays on little > > > > whereas a big core has been added in the affinity. Having in mind that > > > > my system is pretty idle which means that there is almost no other > > > > reason to trigger an ilb than the misfit task, the change in > > > > check_misfit_status() is probably the reason for never kicking an ilb > > > > for such case > > > > > > It seems I reproduced another problem while trying to reproduce the original > > > issue, eh. > > > > > > I did dig more and from what I see the issue is that the rd->overload is not > > > being set correctly. Which I believe what causes the delays (see attached > > > picture how rd.overloaded is 0 with some spikes). Only when CPU7 > > > newidle_balance() coincided with rd->overload being 1 that the migration > > > happens. With the below hack I can see that rd->overload is 1 all the time > > > > But here you rely on another activity happening in CPU7 whereas the > > I don't want to rely on that. I think this is a problem too. And this is what > ends up happening from what I see, sometimes at least. > > When is it expected for newidle_balance to pull anyway? I agree we shouldn't > rely on it to randomly happen, but if it happens sooner, it should pull, no? > > > misfit should trigger by itself the load balance and not expect > > another task waking up then sleeping on cpu7 to trigger a newidle > > balance. We want a normal idle load balance not a newidle_balance > > I think there's a terminology problems. I thought you mean newidle_balnce() by > ilb. It seems you're referring to load_balance() called from > rebalance_domains() when tick happens at idle? newidle_balance is different from idle load balance. newidle_balance happens when the cpu becomes idle whereas busy and idle load balance happen at tick. > > I thought this is not kicking. But I just double checked in my traces and I was > getting confused because I was looking at where run_rebalance_domains() would > happen, for example, on CPU2 but the balance would actually be for CPU7. An idle load balance happens either on the target CPU if its tick is not stopped or we kick one idle CPU to run the idle load balance on behalf of all idle CPUs. This is the latter case that doesn't happen anymore with your patch and the change in check_misfit_status. > > No clue why it fails to pull.. I can see actually we call load_balance() twice > for some (not all) entries to rebalance_domains(). So we don't always operate > on the two domains. But that's not necessarily a problem. We have 3 different reasons for kicking an idle load balance : - to do an actual balance of tasks - to update stats ie blocked load - to update nohz.next_balance You are interested by the 1st one but it's most probably for the 2 last reasons that this happen > > I think it's a good opportunity to add some tracepoints to help break this path > down. If you have suggestions of things to record that'd be helpful.