Received: by 2002:ac0:aa62:0:0:0:0:0 with SMTP id w31-v6csp1806549ima; Thu, 25 Oct 2018 05:23:23 -0700 (PDT) X-Google-Smtp-Source: AJdET5eIxXFDygtt/uTaV+8qGko9NkEnbiu7yxRQR4RpTkrWUzTVryZlL/RLjqSkTIIoyz9InSEy X-Received: by 2002:a62:9316:: with SMTP id b22-v6mr1369024pfe.193.1540470203923; Thu, 25 Oct 2018 05:23:23 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1540470203; cv=none; d=google.com; s=arc-20160816; b=FOtQoHg8EEfpZ1BGKvqQcqeurH0NPkBxaNxLVllLwq+fWA5v4Ip3U9GY72q3THmPGV g/O5QBWIdjum5LcV3q2merGcsPohiSl/e+6D5FynqHVmmqAK6/oh+ZUAoDkuzjA0/dcR Ypc4k9Im0ZeAwMokiILLxwmff/fgePj/lDccNG8VEeTVWD1wmbXDGB5ADSYBhYUCZYF3 FPg/MNY/llosx8QJv9mrmcMwao8ihbb4rkPlB/zZfe2+AbAq7xUuSX6uViWc5/AqLsU7 182S87yoAeo9Lo1lW7b4yimKN4R5WHoxPoUW3+h5SQUrPoTms/MYJxMI+RyjsmTLbkJy aA4A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:organization:from:references:cc:to:subject :dkim-signature; bh=pgQ6EQ/AQAFFlQXzTnnhFbQQC5Lc9Bpqh5SY+hqbWeo=; b=dfZI7NAerwQLj0aFzmYQ05EpKimgKb+eS8or26Tiuv8e28xbt7W/XRHsqrov4WK1H6 WpdlvnjPCLDM9d7+kE8wu8acOer5htsuuSJmTdMF52rpVcs43Jot96YZ+XF95/7McRpH L/WLYTrYcmQ8dE8gIyTrLzPgUeZpcNzF59JbSPGKSqgiVo8RRFKPnN6on5nVopOj1sZ9 ICUkVfPf4+YZxUHG9JCPgYLSAQlfZWiF7F8GqK2bmcofe3MoEqOilFnFnz06Lj+LL4Am 1BnVXbCuA1tfHBBnLPcxO4FYVyFEGiT0GgoxACNmF6HKT6F3fYUz7MiF4qaahojWmRgB vWvQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2018-07-02 header.b=Hh0bCtm4; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id a3-v6si7741468pgb.457.2018.10.25.05.23.07; Thu, 25 Oct 2018 05:23:23 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2018-07-02 header.b=Hh0bCtm4; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727314AbeJYUzS (ORCPT + 99 others); Thu, 25 Oct 2018 16:55:18 -0400 Received: from userp2120.oracle.com ([156.151.31.85]:43378 "EHLO userp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727228AbeJYUzS (ORCPT ); Thu, 25 Oct 2018 16:55:18 -0400 Received: from pps.filterd (userp2120.oracle.com [127.0.0.1]) by userp2120.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w9PC8ZwC009088; Thu, 25 Oct 2018 12:22:21 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=subject : to : cc : references : from : message-id : date : mime-version : in-reply-to : content-type : content-transfer-encoding; s=corp-2018-07-02; bh=pgQ6EQ/AQAFFlQXzTnnhFbQQC5Lc9Bpqh5SY+hqbWeo=; b=Hh0bCtm4OaSPfzRFPKnwQen3MwwBMD8Ox/w08ElZBHgEdZM7HkMYr+Z90BbUVmT/Mot5 cbSzGmPGthqCipnbO/pHHtJy12+Cm/gsqalFoep4awWPw7n+Bz9QqP4SBqOQFnGd6jzd 0b7Ipd2/CCWqtYtsuIT7Jtl0q4kOFxgpACjtQFmWkYfbvbUSwblrWrt+TAqrKBSUC+Ft aiDsZFidd9uLriA7mlrJuzjo435rf5tD+5Qxn8imQKVvsiWO9jLGsAyLedC1OoUivG7u Ubif/UGyn1xhbBvh2P6Sl2BXUSKRySTWVSw8HFzjiyxCYSV3uFUr50all7T+cIzFAX3J Mg== Received: from userv0022.oracle.com (userv0022.oracle.com [156.151.31.74]) by userp2120.oracle.com with ESMTP id 2n7w0r15ye-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 25 Oct 2018 12:22:21 +0000 Received: from aserv0121.oracle.com (aserv0121.oracle.com [141.146.126.235]) by userv0022.oracle.com (8.14.4/8.14.4) with ESMTP id w9PCMF1L003174 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 25 Oct 2018 12:22:15 GMT Received: from abhmp0006.oracle.com (abhmp0006.oracle.com [141.146.116.12]) by aserv0121.oracle.com (8.14.4/8.13.8) with ESMTP id w9PCMEPf003504; Thu, 25 Oct 2018 12:22:14 GMT Received: from [10.152.35.100] (/10.152.35.100) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Thu, 25 Oct 2018 05:22:14 -0700 Subject: Re: [PATCH 00/10] steal tasks to improve CPU utilization To: Valentin Schneider , Peter Zijlstra Cc: mingo@redhat.com, subhra.mazumdar@oracle.com, dhaval.giani@oracle.com, daniel.m.jordan@oracle.com, pavel.tatashin@microsoft.com, matt@codeblueprint.co.uk, umgwanakikbuti@gmail.com, riel@redhat.com, jbacik@fb.com, juri.lelli@redhat.com, linux-kernel@vger.kernel.org References: <1540220381-424433-1-git-send-email-steven.sistare@oracle.com> <20181022170421.GF3117@worktop.programming.kicks-ass.net> <8e38ce84-ec1a-aef7-4784-462ef754f62a@oracle.com> <09b10abc-8357-2db3-3d30-8aa9e95e8655@arm.com> From: Steven Sistare Organization: Oracle Corporation Message-ID: <495866f6-6ab8-55fa-1743-1b6910f94733@oracle.com> Date: Thu, 25 Oct 2018 08:21:58 -0400 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: <09b10abc-8357-2db3-3d30-8aa9e95e8655@arm.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=9056 signatures=668683 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1807170000 definitions=main-1810250108 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 10/25/2018 7:31 AM, Valentin Schneider wrote: > > On 24/10/2018 20:27, Steven Sistare wrote: > [...] >> Hi Valentin, >> >> Asymmetric systems could maintain a separate bitmap for misfits; set a bit >> when a CPU goes on CPU, clear it going off. When a fast CPU goes new idle, >> it would first search the misfits mask, then search cfs_overload_cpus. >> The misfits logic would be conditionalized with CONFIG or sched feat static >> branches so symmetric systems do not incur extra overhead. > > That sounds reasonable - besides, misfit already introduces a > sched_asym_cpucapacity static key. I'll try to play around with that. > >>> We'd also lose the NOHZ update done in idle_balance(), though I think it's >>> not such a big deal - were were piggy-backing this on idle_balance() just >>> because it happened to be convenient, and we still have NOHZ_STATS_KICK >>> anyway. >> >> Agreed. >> >>> Another thing - in your test cases, what is the most prevalent cause of >>> failure to pull a task in idle_balance()? Is it the load_balance() itself >>> that fails to find a task (e.g. because the imbalance is not deemed big >>> enough), or is it the idle migration cost logic that prevents >>> load_balance() from running to completion? >> >> The latter. Eg, for the test "X6-2, 40 CPUs, hackbench 3 process 50000", >> CPU avg_idle is 355566 nsec, and sched_migration_cost_ns = 500000, >> so idle_balance bails at the top: >> if (this_rq->avg_idle < sysctl_sched_migration_cost || >> ... >> goto out >> >> For other tests, we get past that clause but bail from a domain: >> if (this_rq->avg_idle < curr_cost + sd->max_newidle_lb_cost) { >> ... >> break; >> >>> In the first case, try_steal() makes perfect sense to me. In the second >>> case, I'm not sure if we really want to pull something if we know (well, >>> we *think*) we're about to resume the execution of some other task. >> >> 355.566 microsec is enough time to steal, go on CPU, do useful work, and go >> off CPU, particularly for chatty workloads like hackbench. The performance >> data bear this out. For the higher loads, the average timeslice for >> hackbench >> > > Thanks for the explanation. AIUI the big difference here is that try_steal() > is considerably cheaper than load_balance(), so the rq->avg_idle concerns > matter less (or at least, on a considerably smaller scale). Right. >> Perhaps I could skip try_steal() if avg_idle is very small, although with >> hackbench I have seen average time slice as small as 10 microsec under >> high load and preemptions. I'll run some experiments. > > That might be a safe thing to do. In the same department, maybe we could > skip try_steal() if we bail out of idle_balance() because > !(this_rq->rd->overload). Although rq->rd->overload and cfs_overload_cpus > are decoupled, they should express the same thing here. I tried that in an earlier version of my code: new_tasks = idle_balance(rq, rf); if (new_tasks == 0 && rq->rd->overload) new_tasks = try_steal(rq, rf); but I did not see any performance improvement vs without the overload check, so I omitted it for simplicity. - Steve >>>> We could merge the stealing code into the idle_balance() code to get a >>>> union of the two, but IMO that would be less readable. >>>> >>>> We could remove the core and socket levels from idle_balance() >>> >>> I understand that as only doing load_balance() at DIE level in >>> idle_balance(), as that is what makes most sense to me (with big.LITTLE >>> those misfit migrations are done at DIE level), is that correct? >> >> Correct. >>> Also, with DynamIQ (next gen big.LITTLE) we could have asymmetry at MC >>> level, which could cause issues there. >> >> We could keep idle_balance for this level and fall back to stealing as in >> my patch, or you could extend the misfits bitmap to also include CPUs >> with reduced memory bandwidth and active tasks. (if I understand the asymmetry >> correctly). >> > > It's mostly µarch asymmetry, so by "asymmetry at MC level" I meant "we'll > see the SD_ASYM_CPUCAPACITY flag at MC level". But if we tweak stealing > to take misfit tasks into account (so we'd rely on SD_ASYM_CPUCAPACITY > in some way or another), that could work. > >>>> and let >>>> stealing handle those levels. I think that makes sense after stealing >>>> performance is validated on more architectures, but we would still have >>>> two different mechanisms. >>>> >>>> - Steve >>> >>> I'll try out those patches on top of the misfit series to see how the >>> whole thing behaves. >> >> Very good, thanks. >> >> - Steve >>