Received: by 2002:a25:4158:0:0:0:0:0 with SMTP id o85csp2536608yba; Fri, 17 May 2019 20:13:43 -0700 (PDT) X-Google-Smtp-Source: APXvYqwtqXZWTqXmx7hIQOq6Bf+2p6ryuR5Zvn4jJBFk+gE+IUqBkec+wawtbvuqXrNu/goehoYH X-Received: by 2002:aa7:930e:: with SMTP id 14mr8036085pfj.262.1558149223826; Fri, 17 May 2019 20:13:43 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1558149223; cv=none; d=google.com; s=arc-20160816; b=hefjLsB1mcXmjTpJxy1ylymxj56tiTsVEd2vKCpK+jXfLSuiJbzcLM/+CE7YtasioM XthdcLoT8u96fM0jTQLgGjYgJC/YSagbg8e2I7D105po/immwdhtq1aJGs9dZOtXHhNz XiJFKqgl6N2H1qSkZYxQKJrmKDLkNhWJM5OVvZKUtsU30GiBpr9ikTecGaLXhlF4EPcu dD+oBteBTcR7dPaCQ/KjA6KO+HvafZCxLppoqTYnZneeuIOD+YtXm48WiTlDX7652yTD db8pW1oQ42IZh8uPPbqIre6Zx1Tzusew7YO3zwwA1uYNTtxWi8rxZ3Q683Rx+6SZaObv zhgQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject; bh=KdrrTMW31UgH3JGfPGSDPrQRPIYdTal8DBaJuVJACNA=; b=eZrTUuzc0QVm/0jb0TMVUhFX7E2+K2JYNuBWybaDud4oaNUiekFydWKQ6BAE9ddfHX ZScwfHf25rPul5o7T71/gbIqsPZfVabgBKG2Oq7GWtkdK1KyHuiDh/4f37asrUpv36/q XQiPd/1ESsry8R7BVbjPK4lkarJzI9m+wfBiCQLgzFORbaCLfQhq9AJ2noDu8K/4l38a U8DDsiXoY4c4PWl6A3tj+h73FRxUbXLjs/2fSCXtbQbwgtDNKO7LYJX7wEt8fryOyr7G ygmkfjM4F701m2teAPm9TwA23mu5CJ3V2gIfFXeOSiySRAvpqfI1DbXhPoAfg+42b4vB u3pQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id cg1si10086930plb.247.2019.05.17.20.13.15; Fri, 17 May 2019 20:13:43 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728170AbfERBIR (ORCPT + 99 others); Fri, 17 May 2019 21:08:17 -0400 Received: from mga02.intel.com ([134.134.136.20]:53857 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728024AbfERBIR (ORCPT ); Fri, 17 May 2019 21:08:17 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga101.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 17 May 2019 18:08:16 -0700 X-ExtLoop1: 1 Received: from cli6-desk1.ccr.corp.intel.com (HELO [10.239.161.118]) ([10.239.161.118]) by fmsmga001.fm.intel.com with ESMTP; 17 May 2019 18:08:12 -0700 Subject: Re: [RFC PATCH v2 00/17] Core scheduling v2 To: Ingo Molnar , Aubrey Li Cc: Julien Desfossez , Vineeth Remanan Pillai , Nishanth Aravamudan , Peter Zijlstra , Tim Chen , Thomas Gleixner , Paul Turner , Linus Torvalds , Linux List Kernel Mailing , Subhra Mazumdar , =?UTF-8?B?RnLDqWTDqXJpYyBXZWlzYmVja2Vy?= , Kees Cook , Greg Kerr , Phil Auld , Aaron Lu , Valentin Schneider , Mel Gorman , Pawan Gupta , Paolo Bonzini References: <20190427142137.GA72051@gmail.com> <20190428093304.GA7393@gmail.com> <20190428121721.GA121434@gmail.com> <20190429061422.GA20939@gmail.com> <24bca399-5370-c4b5-725f-979db06bfc29@linux.intel.com> <20190429160058.GA82935@gmail.com> <20190430044250.GC73609@gmail.com> <9489d9e4-1dae-fc84-53eb-beb0c1418c0f@linux.intel.com> From: "Li, Aubrey" Message-ID: <44e6eba0-abf1-3251-c16a-96ee28dcc14b@linux.intel.com> Date: Sat, 18 May 2019 09:08:12 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.1.1 MIME-Version: 1.0 In-Reply-To: <9489d9e4-1dae-fc84-53eb-beb0c1418c0f@linux.intel.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2019/5/18 8:58, Li, Aubrey wrote: > On 2019/4/30 12:42, Ingo Molnar wrote: >> >>>> What's interesting is how in the over-saturated case (the last three >>>> rows: 128, 256 and 512 total threads) coresched-SMT leaves 20-30% CPU >>>> performance on the floor according to the load figures. >>> > > Sorry for a delay, I got a chance to obtain some profiling results. Here > is the story on my side. I still used the previous testing 128/128 case > (256 threads totally), and focus on CPU53(randomly pickup) only. > > Firstly, mpstat reports cpu utilization, > - baseline is 100%, > - coresched-SMT is 87.51% > > Then I traced sched_switch trace point, in 100s sampling period, > - baseline context switch 14083 times, next task idle 0 times > - coresched-SMT context switch 15101 times, next task idle 880 times > > So I guess pick_next_task() is mostly the interesting place, then I > dig into the trace log on coresched-SMT case: > - CPU53 selected idle task 767 times (matched with the data of sched_switch) > > There are 3 branches of CPU53 selecting idle task in pick_next_task(): > - pick pre selected 765 times > - unconstrained pick 1 times > - picked: swapper/53/0 1 times > > Where CPU53's "pick pre selected idle task" from? I guess its from its > brother CPU1, so I checked CPU1's trace log and found: > - CPU1 helped its sibling CPU53 select idle task 800 times > > So for CPU53, the most interesting part occurs in pick_task(), that is: > -The sibling CPU1 helped to select idle task in pick_task() > > Forgive me to paste this routine() here: > ===================================================== > +// XXX fairness/fwd progress conditions > +static struct task_struct * > +pick_task(struct rq *rq, const struct sched_class *class, struct task_struct *max) > +{ > + struct task_struct *class_pick, *cookie_pick; > + unsigned long cookie = 0UL; > + > + /* > + * We must not rely on rq->core->core_cookie here, because we fail to reset > + * rq->core->core_cookie on new picks, such that we can detect if we need > + * to do single vs multi rq task selection. > + */ > + > + if (max && max->core_cookie) { > + WARN_ON_ONCE(rq->core->core_cookie != max->core_cookie); > + cookie = max->core_cookie; > + } > + > + class_pick = class->pick_task(rq); > + if (!cookie) > + return class_pick; > + > + cookie_pick = sched_core_find(rq, cookie); > + if (!class_pick) > + return cookie_pick; > + > + /* > + * If class > max && class > cookie, it is the highest priority task on > + * the core (so far) and it must be selected, otherwise we must go with > + * the cookie pick in order to satisfy the constraint. > + */ > + if (cpu_prio_less(cookie_pick, class_pick) && core_prio_less(max, class_pick)) > + return class_pick; > + > + return cookie_pick; > +} > ================================================================= > > And the most related log of the case: > ================================================================= > <...>-21553 [001] dN.. 87341.514992: __schedule: cpu(1): selected: gemmbench/21294 ffff888823df8900 > <...>-21553 [001] dN.. 87341.514992: __schedule: max: gemmbench/21294 ffff888823df8900 > <...>-21553 [001] dN.. 87341.514995: __schedule: (swapper/53/0;140,0,0) ?< (sysbench/21503;140,457178607302,0) > <...>-21553 [001] dN.. 87341.514996: __schedule: (gemmbench/21294;119,219715519947,0) ?< (sysbench/21503;119,457178607302,0) > <...>-21553 [001] dN.. 87341.514996: __schedule: cpu(53): selected: swapper/53/0 0 > > It said, > - CPU1 selected gemmbench for itself > - and gemmbench was assigned to max of this core > - then CPU1 helped CPU53 to pick_task() > -- CPU1 used class->pick_task(), selected sysbench for CPU53 > -- CPU1 used cookie_pick, selected swapper(idle task) for CPU53 > -- the class_pick(sysbench) unfortunately didn't pass the priority check > - idle task picked up at the end(sadly). > > So, I think if we want to improve CPU utilization under this scenario, > the straightforward tweak is picking up class_pick if cookie_pick is idle. Another quick thought is, in CPU53's own path of pick_next_task, give up pre selected(by CPU1) if pre selected is idle? > But I know, this is a violation of the design philosophy(avoid L1TF) of > this proposal. > > Does it make sense to add a knob to switch security/performance? > Welcome any comments! > > Thanks, > -Aubrey >