Received: by 2002:a25:4158:0:0:0:0:0 with SMTP id o85csp2494093yba; Fri, 17 May 2019 19:01:22 -0700 (PDT) X-Google-Smtp-Source: APXvYqyVgUrdqbl2ZB6rpsHgX1rzHvTy6AwtCxGowU0E5w12QBk9t2qIIuhEvUnf9HdDFXPUj0/6 X-Received: by 2002:a17:902:b407:: with SMTP id x7mr61860131plr.28.1558144882270; Fri, 17 May 2019 19:01:22 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1558144882; cv=none; d=google.com; s=arc-20160816; b=VHBLXrXq4ZXer7OaPQcNF6wUBFlQfQNle0tMoBYiAD2MVYbxVls7lWBELTXaOXSHXq KvRjLRd8YZ8p4SPWpDknzMXO8BrkJvW7GKT61laOgB4yX4k3ain8MP0xNXlMs9aDkGDk iAdoimiWrIY/hGGP9zPPEzvUhoAWRBKON0JHvYk2yZuMgjzEt43MPZRkObj2PRCLjR+7 HQlFgd0KPKGx7PB86qL84iPoWSkqBkhiVHHcEjhezmW2f3hkILk1Vq3QCTiUw+ewh/oh wqFpfaSJuKBJw+xhVjfpoMW+F5lUtFgylwbmeTl92zUGkqZ9x/Kl3w7VUhXLINeHlkbL Sq5g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject; bh=OGX9Bri5JXcM/+HLFdTLsM8PD5rDi45/wDfCoPNVLKc=; b=Ml9JpgKwYlwz4qy7szCes0SIN9KgKD8ZiksO14aMYLvTo+profoUySp0DBVM8YFWPv YdegIYOYGR00MBYfi4SqaRLBLkDuiga9ZBZ+mBWI/nCWbBaRSaEZlBWaseDevAI5M6Md lMbMu+VK0oJKZAtWofKPAWozDJ1/rmYIZ301fPKHWhnme11UcA4nSii/UyeSjjo5XDUO nlvaUJ7J7D1GpvFfbK8/65z6HskwVz0BH+kcWHQI2B2Ly+qjgVKlBjad6QbbXqJf89/f ZJnaxgs1scwlZxD5Kyg6Tbst+7YxE5nwh5uF039eEVwbzQ6fXcjf9eqw4rtYo4J/AHvA KAuA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id k83si10596221pfb.101.2019.05.17.19.01.06; Fri, 17 May 2019 19:01:22 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727439AbfERA6i (ORCPT + 99 others); Fri, 17 May 2019 20:58:38 -0400 Received: from mga04.intel.com ([192.55.52.120]:33935 "EHLO mga04.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726200AbfERA6h (ORCPT ); Fri, 17 May 2019 20:58:37 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 17 May 2019 17:58:37 -0700 X-ExtLoop1: 1 Received: from cli6-desk1.ccr.corp.intel.com (HELO [10.239.161.118]) ([10.239.161.118]) by fmsmga001.fm.intel.com with ESMTP; 17 May 2019 17:58:33 -0700 Subject: Re: [RFC PATCH v2 00/17] Core scheduling v2 To: Ingo Molnar , Aubrey Li Cc: Julien Desfossez , Vineeth Remanan Pillai , Nishanth Aravamudan , Peter Zijlstra , Tim Chen , Thomas Gleixner , Paul Turner , Linus Torvalds , Linux List Kernel Mailing , Subhra Mazumdar , =?UTF-8?B?RnLDqWTDqXJpYyBXZWlzYmVja2Vy?= , Kees Cook , Greg Kerr , Phil Auld , Aaron Lu , Valentin Schneider , Mel Gorman , Pawan Gupta , Paolo Bonzini References: <20190427142137.GA72051@gmail.com> <20190428093304.GA7393@gmail.com> <20190428121721.GA121434@gmail.com> <20190429061422.GA20939@gmail.com> <24bca399-5370-c4b5-725f-979db06bfc29@linux.intel.com> <20190429160058.GA82935@gmail.com> <20190430044250.GC73609@gmail.com> From: "Li, Aubrey" Message-ID: <9489d9e4-1dae-fc84-53eb-beb0c1418c0f@linux.intel.com> Date: Sat, 18 May 2019 08:58:33 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.1.1 MIME-Version: 1.0 In-Reply-To: <20190430044250.GC73609@gmail.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2019/4/30 12:42, Ingo Molnar wrote: > >>> What's interesting is how in the over-saturated case (the last three >>> rows: 128, 256 and 512 total threads) coresched-SMT leaves 20-30% CPU >>> performance on the floor according to the load figures. >> Sorry for a delay, I got a chance to obtain some profiling results. Here is the story on my side. I still used the previous testing 128/128 case (256 threads totally), and focus on CPU53(randomly pickup) only. Firstly, mpstat reports cpu utilization, - baseline is 100%, - coresched-SMT is 87.51% Then I traced sched_switch trace point, in 100s sampling period, - baseline context switch 14083 times, next task idle 0 times - coresched-SMT context switch 15101 times, next task idle 880 times So I guess pick_next_task() is mostly the interesting place, then I dig into the trace log on coresched-SMT case: - CPU53 selected idle task 767 times (matched with the data of sched_switch) There are 3 branches of CPU53 selecting idle task in pick_next_task(): - pick pre selected 765 times - unconstrained pick 1 times - picked: swapper/53/0 1 times Where CPU53's "pick pre selected idle task" from? I guess its from its brother CPU1, so I checked CPU1's trace log and found: - CPU1 helped its sibling CPU53 select idle task 800 times So for CPU53, the most interesting part occurs in pick_task(), that is: -The sibling CPU1 helped to select idle task in pick_task() Forgive me to paste this routine() here: ===================================================== +// XXX fairness/fwd progress conditions +static struct task_struct * +pick_task(struct rq *rq, const struct sched_class *class, struct task_struct *max) +{ + struct task_struct *class_pick, *cookie_pick; + unsigned long cookie = 0UL; + + /* + * We must not rely on rq->core->core_cookie here, because we fail to reset + * rq->core->core_cookie on new picks, such that we can detect if we need + * to do single vs multi rq task selection. + */ + + if (max && max->core_cookie) { + WARN_ON_ONCE(rq->core->core_cookie != max->core_cookie); + cookie = max->core_cookie; + } + + class_pick = class->pick_task(rq); + if (!cookie) + return class_pick; + + cookie_pick = sched_core_find(rq, cookie); + if (!class_pick) + return cookie_pick; + + /* + * If class > max && class > cookie, it is the highest priority task on + * the core (so far) and it must be selected, otherwise we must go with + * the cookie pick in order to satisfy the constraint. + */ + if (cpu_prio_less(cookie_pick, class_pick) && core_prio_less(max, class_pick)) + return class_pick; + + return cookie_pick; +} ================================================================= And the most related log of the case: ================================================================= <...>-21553 [001] dN.. 87341.514992: __schedule: cpu(1): selected: gemmbench/21294 ffff888823df8900 <...>-21553 [001] dN.. 87341.514992: __schedule: max: gemmbench/21294 ffff888823df8900 <...>-21553 [001] dN.. 87341.514995: __schedule: (swapper/53/0;140,0,0) ?< (sysbench/21503;140,457178607302,0) <...>-21553 [001] dN.. 87341.514996: __schedule: (gemmbench/21294;119,219715519947,0) ?< (sysbench/21503;119,457178607302,0) <...>-21553 [001] dN.. 87341.514996: __schedule: cpu(53): selected: swapper/53/0 0 It said, - CPU1 selected gemmbench for itself - and gemmbench was assigned to max of this core - then CPU1 helped CPU53 to pick_task() -- CPU1 used class->pick_task(), selected sysbench for CPU53 -- CPU1 used cookie_pick, selected swapper(idle task) for CPU53 -- the class_pick(sysbench) unfortunately didn't pass the priority check - idle task picked up at the end(sadly). So, I think if we want to improve CPU utilization under this scenario, the straightforward tweak is picking up class_pick if cookie_pick is idle. But I know, this is a violation of the design philosophy(avoid L1TF) of this proposal. Does it make sense to add a knob to switch security/performance? Welcome any comments! Thanks, -Aubrey