Received: by 2002:a25:1506:0:0:0:0:0 with SMTP id 6csp1613313ybv; Fri, 21 Feb 2020 00:17:35 -0800 (PST) X-Google-Smtp-Source: APXvYqxyIBX5csrNmcIVGomAXoGrdNyTnAfc91WvrqJ4/E77PQfUwO1swQloWjmHbjg95EO2631N X-Received: by 2002:aca:e146:: with SMTP id y67mr954426oig.93.1582273055307; Fri, 21 Feb 2020 00:17:35 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1582273055; cv=none; d=google.com; s=arc-20160816; b=dCaEcXfVlirYKM6n/xUFSStqAKgNnuUQC68KUd2CtSE+DH834pttzP+buF2prk3+Nt 6LzbBxmlsYVWFLFHbTAv8Iup7mimoyYoqqU8XThpcOtMj9DgBceR2REMbUA6g2FJavRw S565EGzY6r1G4DtxiVR9hc+g/68/2bfWvMda5ewz6ZF/JYtr06k3nsyrkcEBkm+A0bPA znCdbhruEr3kg1gIZwRJuxazeoO4+2/2/J/b5hV6jw6gVrzCCoNhd0H1aOPnHBj51A4w xEOooPWOAvFKUrbJ/TGswFileFJCEt+VGyfJZcrvuQFl/0xmUI0akXlA9s1H1FZIxmME ny5g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dmarc-filter:dkim-signature; bh=qRMVh44K5kD9tGiF9DY6h1ZTI5BeEuKfIvbo8KomdCQ=; b=RjWKw7cBhSY7XFiVh4xSSCvfPuxf6lvPfKKio3C3mMlHSwZHRRuPa5sro/h2oDNXTR JxjzxJBlchXxQwjOyyT10l1glygQ8DM/S5eBUHFiXEZndbtZ/upIIskeBGHZ5KGRHQXF UPIn3ryMUYNxfI2IC1t/oD3P57g+ToxfqfBQfoB64+ckcELYO0GMY+cYKMrba5E2Pc8I wZxbgpawiUGeN8pYUumY75j7/slttKqoNWafO0QNJCHksasvgWumFRzeUYhDImuc2+qw m4y/ENqpfOOTYRnR6W3HItqCnaFBRqXXx7uyXYacPDT6h7WTQhcQAYeeiuuL7CtqcRwb I/ew== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@mg.codeaurora.org header.s=smtp header.b=DjNFLJXl; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id w17si483947oiw.127.2020.02.21.00.17.22; Fri, 21 Feb 2020 00:17:35 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail header.i=@mg.codeaurora.org header.s=smtp header.b=DjNFLJXl; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387551AbgBUIQf (ORCPT + 99 others); Fri, 21 Feb 2020 03:16:35 -0500 Received: from mail26.static.mailgun.info ([104.130.122.26]:43834 "EHLO mail26.static.mailgun.info" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2387524AbgBUIQ3 (ORCPT ); Fri, 21 Feb 2020 03:16:29 -0500 DKIM-Signature: a=rsa-sha256; v=1; c=relaxed/relaxed; d=mg.codeaurora.org; q=dns/txt; s=smtp; t=1582272988; h=In-Reply-To: Content-Type: MIME-Version: References: Message-ID: Subject: Cc: To: From: Date: Sender; bh=qRMVh44K5kD9tGiF9DY6h1ZTI5BeEuKfIvbo8KomdCQ=; b=DjNFLJXlPihNvj0Rv6r5ICQti5sZWUWS1nkqmY6MJiRKtTWCWpZulNUh/HsLHew8RFROYa1M 325zUjPD8rsirJrZYgSTWItCfQSP0UTMYolusdQvroGI+39zbtdtRMRyochsY0so7IipBb6A 5SNWMrQpvjDw3PE08iKMRGzjQOI= X-Mailgun-Sending-Ip: 104.130.122.26 X-Mailgun-Sid: WyI0MWYwYSIsICJsaW51eC1rZXJuZWxAdmdlci5rZXJuZWwub3JnIiwgImJlOWU0YSJd Received: from smtp.codeaurora.org (ec2-35-166-182-171.us-west-2.compute.amazonaws.com [35.166.182.171]) by mxa.mailgun.org with ESMTP id 5e4f91c0.7ff8cb8f4458-smtp-out-n02; Fri, 21 Feb 2020 08:16:00 -0000 (UTC) Received: by smtp.codeaurora.org (Postfix, from userid 1001) id B8AB7C4479F; Fri, 21 Feb 2020 08:15:59 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-caf-mail-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-1.0 required=2.0 tests=ALL_TRUSTED,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.0 Received: from codeaurora.org (blr-c-bdr-fw-01_GlobalNAT_AllZones-Outside.qualcomm.com [103.229.19.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) (Authenticated sender: pkondeti) by smtp.codeaurora.org (Postfix) with ESMTPSA id 7D1E7C433A2; Fri, 21 Feb 2020 08:15:54 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 smtp.codeaurora.org 7D1E7C433A2 Authentication-Results: aws-us-west-2-caf-mail-1.web.codeaurora.org; dmarc=none (p=none dis=none) header.from=codeaurora.org Authentication-Results: aws-us-west-2-caf-mail-1.web.codeaurora.org; spf=none smtp.mailfrom=pkondeti@codeaurora.org Date: Fri, 21 Feb 2020 13:45:51 +0530 From: Pavan Kondeti To: Qais Yousef Cc: Ingo Molnar , Peter Zijlstra , Steven Rostedt , Dietmar Eggemann , Juri Lelli , Vincent Guittot , Ben Segall , Mel Gorman , linux-kernel@vger.kernel.org Subject: Re: [PATCH 3/3] sched/rt: fix pushing unfit tasks to a better CPU Message-ID: <20200221081551.GG28029@codeaurora.org> References: <20200214163949.27850-1-qais.yousef@arm.com> <20200214163949.27850-4-qais.yousef@arm.com> <20200217092329.GC28029@codeaurora.org> <20200217135306.cjc2225wdlwqiicu@e107158-lin.cambridge.arm.com> <20200219140243.wfljmupcrwm2jelo@e107158-lin> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200219140243.wfljmupcrwm2jelo@e107158-lin> User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Feb 19, 2020 at 02:02:44PM +0000, Qais Yousef wrote: > On 02/17/20 13:53, Qais Yousef wrote: > > On 02/17/20 14:53, Pavan Kondeti wrote: > > > I notice a case where tasks would migrate for no reason (happens without this > > > patch also). Assuming BIG cores are busy with other RT tasks. Now this RT > > > task can go to *any* little CPU. There is no bias towards its previous CPU. > > > I don't know if it makes any difference but I see RT task placement is too > > > keen on reducing the migrations unless it is absolutely needed. > > > > In find_lowest_rq() there's a check if the task_cpu(p) is in the lowest_mask > > and prefer it if it is. > > > > But yeah I see it happening too > > > > https://imgur.com/a/FYqLIko > > > > Tasks on CPU 0 and 3 swap. Note that my tasks are periodic but the plots don't > > show that. > > > > I shouldn't have changed something to affect this bias. Do you think it's > > something I introduced? > > > > It's something maybe worth digging into though. I'll try to have a look. > > FWIW, I dug a bit into this and I found out we have a thundering herd issue. > > Since I just have a set of periodic task that all start together, > select_task_rq_rt() ends up selecting the same fitting CPU for all of them > (CPU1). The end up all waking up on CPU1, only to get pushed back out > again with only one surviving. > > This reshuffles the task placement ending with some tasks being swapped. > > I don't think this problem is specific to my change and could happen without > it. > > The problem is caused by the way find_lowest_rq() selects a cpu in the mask > > 1750 best_cpu = cpumask_first_and(lowest_mask, > 1751 sched_domain_span(sd)); > 1752 if (best_cpu < nr_cpu_ids) { > 1753 rcu_read_unlock(); > 1754 return best_cpu; > 1755 } > > It always returns the first CPU in the mask. Or the mask could only contain > a single CPU too. The end result is that we most likely end up herding all the > tasks that wake up simultaneously to the same CPU. > > I'm not sure how to fix this problem yet. > Yes, I have seen this problem too. This is not limited to RT even fair class (find_energy_efficient_cpu path) also have the same issue. There is a window where we select a CPU for the task and the task being queued there. Because of this, we may select the same CPU for two successive waking tasks. Turning off TTWU_QUEUE sched feature addresses this up to some extent. At least it would solve the cases like multiple tasks getting woken up from an interrupt handler. Thanks, Pavan -- Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc. Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.