Received: by 2002:a25:ad19:0:0:0:0:0 with SMTP id y25csp11346941ybi; Thu, 25 Jul 2019 14:44:02 -0700 (PDT) X-Google-Smtp-Source: APXvYqzY8UecN88zkeh9A2DjTrfms+wK3uB70klGA0+7RC2ndq2ayMkYLoOmsgYdHxGRMBhW8Qrr X-Received: by 2002:a17:902:724c:: with SMTP id c12mr91268999pll.219.1564091042327; Thu, 25 Jul 2019 14:44:02 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1564091042; cv=none; d=google.com; s=arc-20160816; b=N6kHLMAFrJJqtks61Ctl0YTBpFZkpJoym0RPjtVWTi1gKUe7XnjJZ6CyC7VwcRX3gc Ozo97ISpUOxjm72QwF1tR2iVqNpQDt0blHE7LT2pKl5VWumlrSma+ja4+fS9PRBS0nXh JdTxpmfnlt/07bF6kmtdI304hZ7EBayzyYIUvDiaU1QpindfG7eQi8zpHFTrXsFyTKcb jcbvuWtashkqr5O16+drj42sUjUUCevtZ/R0u+dWgp0EH8PptaTwDg3v4chCT3akULNU H5MQ54doQnu4iMQeNm8X0jQu8WNFhKp/YyBd9QP+UIocr/xjNFMwWovg7DeLQsn5ZNCQ njDQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject; bh=TIZCh7po+8rxTpqZbMa0rMCTN69aoTDWh3NM95hja2U=; b=UhztlYMt7+tyx/XluRIN7VPp9fn7ImNJydyn/f70WwEnNgnj0sPLKo6xstiVPW8RlT tczjKnWVzH6TbfLNxPISbOJjl/yowvdoBiywnNCWYkWph7E+P17fIy4Vsq+zFmExaHvf AcuPKxkLgOt199Jx7yCKe4dmT5HpqazLYl0qso/Rkl9EKlBuOVz0P35IfL/SfU6a/yTI /H7CWtE/IKlSMLat3Logv4btj+86R4I+R9S33znvzP6r9iOhlh7p6WKWRGRj8LiQ1BEY uj5RyHhYkS3qfPD/UGxzK5aYj5Ucbj0ICdbDsNMRV/jszz4H9HNNki+KXO0aPNvMuuqi s3Fw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id ay3si17147286plb.174.2019.07.25.14.43.47; Thu, 25 Jul 2019 14:44:02 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726821AbfGYVnB (ORCPT + 99 others); Thu, 25 Jul 2019 17:43:01 -0400 Received: from mga17.intel.com ([192.55.52.151]:50797 "EHLO mga17.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725819AbfGYVnB (ORCPT ); Thu, 25 Jul 2019 17:43:01 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmsmga107.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 25 Jul 2019 14:43:00 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.64,308,1559545200"; d="scan'208";a="178151887" Received: from cli6-desk1.ccr.corp.intel.com (HELO [10.239.161.118]) ([10.239.161.118]) by FMSMGA003.fm.intel.com with ESMTP; 25 Jul 2019 14:42:57 -0700 Subject: Re: [RFC PATCH v3 00/16] Core scheduling v3 To: Aaron Lu , Aubrey Li Cc: Julien Desfossez , Subhra Mazumdar , Vineeth Remanan Pillai , Nishanth Aravamudan , Peter Zijlstra , Tim Chen , Ingo Molnar , Thomas Gleixner , Paul Turner , Linus Torvalds , Linux List Kernel Mailing , =?UTF-8?B?RnLDqWTDqXJpYyBXZWlzYmVja2Vy?= , Kees Cook , Greg Kerr , Phil Auld , Valentin Schneider , Mel Gorman , Pawan Gupta , Paolo Bonzini References: <20190531210816.GA24027@sinkpad> <20190606152637.GA5703@sinkpad> <20190612163345.GB26997@sinkpad> <635c01b0-d8f3-561b-5396-10c75ed03712@oracle.com> <20190613032246.GA17752@sinkpad> <20190619183302.GA6775@sinkpad> <20190718100714.GA469@aaronlu> <20190725143003.GA992@aaronlu> From: "Li, Aubrey" Message-ID: Date: Fri, 26 Jul 2019 05:42:57 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.1.1 MIME-Version: 1.0 In-Reply-To: <20190725143003.GA992@aaronlu> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2019/7/25 22:30, Aaron Lu wrote: > On Mon, Jul 22, 2019 at 06:26:46PM +0800, Aubrey Li wrote: >> The granularity period of util_avg seems too large to decide task priority >> during pick_task(), at least it is in my case, cfs_prio_less() always picked >> core max task, so pick_task() eventually picked idle, which causes this change >> not very helpful for my case. >> >> -0 [057] dN.. 83.716973: __schedule: max: sysbench/2578 >> ffff889050f68600 >> -0 [057] dN.. 83.716974: __schedule: >> (swapper/5/0;140,0,0) ?< (mysqld/2511;119,1042118143,0) >> -0 [057] dN.. 83.716975: __schedule: >> (sysbench/2578;119,96449836,0) ?< (mysqld/2511;119,1042118143,0) >> -0 [057] dN.. 83.716975: cfs_prio_less: picked >> sysbench/2578 util_avg: 20 527 -507 <======= here=== >> -0 [057] dN.. 83.716976: __schedule: pick_task cookie >> pick swapper/5/0 ffff889050f68600 > > I tried a different approach based on vruntime with 3 patches following. > > When the two tasks are on the same CPU, no change is made, I still route > the two sched entities up till they are in the same group(cfs_rq) and > then do the vruntime comparison. > > When the two tasks are on differen threads of the same core, the root > level sched_entities to which the two tasks belong will be used to do > the comparison. > > An ugly illustration for the cross CPU case: > > cpu0 cpu1 > / | \ / | \ > se1 se2 se3 se4 se5 se6 > / \ / \ > se21 se22 se61 se62 > > Assume CPU0 and CPU1 are smt siblings and task A's se is se21 while > task B's se is se61. To compare priority of task A and B, we compare > priority of se2 and se6. The smaller vruntime wins. > > To make this work, the root level ses on both CPU should have a common > cfs_rq min vuntime, which I call it the core cfs_rq min vruntime. > > This is mostly done in patch2/3. > > Test: > 1 wrote an cpu intensive program that does nothing but while(1) in > main(), let's call it cpuhog; > 2 start 2 cgroups, with one cgroup's cpuset binding to CPU2 and the > other binding to cpu3. cpu2 and cpu3 are smt siblings on the test VM; > 3 enable cpu.tag for the two cgroups; > 4 start one cpuhog task in each cgroup; > 5 kill both cpuhog tasks after 10 seconds; > 6 check each cgroup's cpu usage. > > If the task is scheduled fairly, then each cgroup's cpu usage should be > around 5s. > > With v3, the cpu usage of both cgroups are sometimes 3s, 7s; sometimes > 1s, 9s. > > With the 3 patches applied, the numbers are mostly around 5s, 5s. > > Another test is starting two cgroups simultaneously with cpu.tag set, > with one cgroup running: will-it-scale/page_fault1_processes -t 16 -s 30, > the other running: will-it-scale/page_fault2_processes -t 16 -s 30. > With v3, like I said last time, the later started page_fault processes > can't start running. With the 3 patches applied, both running at the > same time with each CPU having a relatively fair score: > > output line of 16 page_fault1 processes in 1 second interval: > min:105225 max:131716 total:1872322 > > output line of 16 page_fault2 processes in 1 second interval: > min:86797 max:110554 total:1581177 > > Note the value in min and max, the smaller the gap is, the better the > faireness is. > > Aubrey, > > I haven't been able to run your workload yet... > No worry, let me try to see how it works. Thanks, -Aubrey