Received: by 2002:a25:1506:0:0:0:0:0 with SMTP id 6csp1770900ybv; Fri, 21 Feb 2020 03:13:01 -0800 (PST) X-Google-Smtp-Source: APXvYqw/bLeZ9gvdRhZAfVdJJtFfAY3Blw09FvDycqgxpbAXkHaKxcs/v8zAIsPvO4AA54UiwK/4 X-Received: by 2002:aca:48cd:: with SMTP id v196mr1587093oia.102.1582283581289; Fri, 21 Feb 2020 03:13:01 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1582283581; cv=none; d=google.com; s=arc-20160816; b=d98M3l6NTwkz42PdxB82UGJHU0YvSDi8pYUXeuK86Gn7f2sd9buyEpUr/b6EP5LG1m cheuVDgt5t/76nBiCRob/KqHS/AKbKrisDHy709zgyy+ubDgueGIApjy3cpMrIBtFmG0 L1tC7WokDFoSzi/3an6kdHPC7K6JR6nI2sPON/Cl9HpdxPn7kMUJpdcKBkiOvocUNFrB GzyInJ6OJ938MTubTMUrTxTFAAZdT6zz5LBBfBGDraTlT3zHfV2wf3oE+59EKceI90fM B+KP4SIvXHshs8qGTPF3hf6+r4ziNJwBvhNLuphWG3LWMfVHl/6jpWDTEKMhMvdopPjr h1wQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=aXXzexXIHHmr23AIJBMCynf6sAIX1sniIwzha4SVKMg=; b=QhyojbkeZiSadK8RIO/xpVbyfXhrhCPQprpe4MihXXhnQm3GKLiVV3tTkuxYe0Dcek jGvigVyzhtRzJw0LuACXqwzbuPsJKWgaeY48knhFtTV7Mry2ivWXW1Quv4QMdryVcoEd P7wPhUJSNN6pGzUIO6XFKi8pJ9Bs2Vx0aCnJhj78HJf25g9X56gpvj3eAMqt4FPD15/Q xgB9DR07OiCIk/WsNATDuW3kf7lcsg5ub7Wg6/pfd3VCsDNg+StZ1Vf24f7GVeUzDT4V Lt01XYFnZKlZoCsLH/Y1bm6bXlBkodEjgoL8F5BRSPhhOpTRqx5Dj/AQkIj+u3krshGa aPVg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id s11si651741oic.57.2020.02.21.03.12.48; Fri, 21 Feb 2020 03:13:01 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727984AbgBULM2 (ORCPT + 99 others); Fri, 21 Feb 2020 06:12:28 -0500 Received: from foss.arm.com ([217.140.110.172]:36834 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726100AbgBULM2 (ORCPT ); Fri, 21 Feb 2020 06:12:28 -0500 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 6FE7A31B; Fri, 21 Feb 2020 03:12:27 -0800 (PST) Received: from e107158-lin.cambridge.arm.com (e107158-lin.cambridge.arm.com [10.1.195.21]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 1DB9A3F68F; Fri, 21 Feb 2020 03:12:26 -0800 (PST) Date: Fri, 21 Feb 2020 11:12:23 +0000 From: Qais Yousef To: Pavan Kondeti Cc: Ingo Molnar , Peter Zijlstra , Steven Rostedt , Dietmar Eggemann , Juri Lelli , Vincent Guittot , Ben Segall , Mel Gorman , linux-kernel@vger.kernel.org Subject: Re: [PATCH 3/3] sched/rt: fix pushing unfit tasks to a better CPU Message-ID: <20200221111223.bb4y57zdox7qdy2w@e107158-lin.cambridge.arm.com> References: <20200214163949.27850-1-qais.yousef@arm.com> <20200214163949.27850-4-qais.yousef@arm.com> <20200217092329.GC28029@codeaurora.org> <20200217135306.cjc2225wdlwqiicu@e107158-lin.cambridge.arm.com> <20200219140243.wfljmupcrwm2jelo@e107158-lin> <20200221081551.GG28029@codeaurora.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20200221081551.GG28029@codeaurora.org> User-Agent: NeoMutt/20171215 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 02/21/20 13:45, Pavan Kondeti wrote: > On Wed, Feb 19, 2020 at 02:02:44PM +0000, Qais Yousef wrote: > > On 02/17/20 13:53, Qais Yousef wrote: > > > On 02/17/20 14:53, Pavan Kondeti wrote: > > > > I notice a case where tasks would migrate for no reason (happens without this > > > > patch also). Assuming BIG cores are busy with other RT tasks. Now this RT > > > > task can go to *any* little CPU. There is no bias towards its previous CPU. > > > > I don't know if it makes any difference but I see RT task placement is too > > > > keen on reducing the migrations unless it is absolutely needed. > > > > > > In find_lowest_rq() there's a check if the task_cpu(p) is in the lowest_mask > > > and prefer it if it is. > > > > > > But yeah I see it happening too > > > > > > https://imgur.com/a/FYqLIko > > > > > > Tasks on CPU 0 and 3 swap. Note that my tasks are periodic but the plots don't > > > show that. > > > > > > I shouldn't have changed something to affect this bias. Do you think it's > > > something I introduced? > > > > > > It's something maybe worth digging into though. I'll try to have a look. > > > > FWIW, I dug a bit into this and I found out we have a thundering herd issue. > > > > Since I just have a set of periodic task that all start together, > > select_task_rq_rt() ends up selecting the same fitting CPU for all of them > > (CPU1). The end up all waking up on CPU1, only to get pushed back out > > again with only one surviving. > > > > This reshuffles the task placement ending with some tasks being swapped. > > > > I don't think this problem is specific to my change and could happen without > > it. > > > > The problem is caused by the way find_lowest_rq() selects a cpu in the mask > > > > 1750 best_cpu = cpumask_first_and(lowest_mask, > > 1751 sched_domain_span(sd)); > > 1752 if (best_cpu < nr_cpu_ids) { > > 1753 rcu_read_unlock(); > > 1754 return best_cpu; > > 1755 } > > > > It always returns the first CPU in the mask. Or the mask could only contain > > a single CPU too. The end result is that we most likely end up herding all the > > tasks that wake up simultaneously to the same CPU. > > > > I'm not sure how to fix this problem yet. > > > > Yes, I have seen this problem too. This is not limited to RT even fair class > (find_energy_efficient_cpu path) also have the same issue. There is a window > where we select a CPU for the task and the task being queued there. Because of > this, we may select the same CPU for two successive waking tasks. Turning off > TTWU_QUEUE sched feature addresses this up to some extent. At least it would > solve the cases like multiple tasks getting woken up from an interrupt handler. Oh, handy. Let me try this out. I added it to my to-do to investigate it when I have time anyway. In modern systems where L3 is spanning all CPUs, the migration isn't that costly, but it'd still be unnecessary wakeup latency that can add up. Thanks -- Qais Yousef