Received: by 2002:a05:6a10:16a7:0:0:0:0 with SMTP id gp39csp187615pxb; Wed, 11 Nov 2020 00:42:42 -0800 (PST) X-Google-Smtp-Source: ABdhPJztV2hvxeBgh0Ra+T4cH4VkMQxss7g2vI3KW42mMmIFB4E4D5ElYfyNZjOc0YWtJy/DZ1eK X-Received: by 2002:a17:906:5841:: with SMTP id h1mr23570647ejs.342.1605084162440; Wed, 11 Nov 2020 00:42:42 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1605084162; cv=none; d=google.com; s=arc-20160816; b=D7wW/LUJcod79XjcdJIZYp4Hq4D6RWRMDHkGkIKlbV8CrI0lTEgaiIBaXXEjx58oWo 4WKZKg8y21mtzy13hdjxRfGT5+6VpgN+InXAHoVUCAhxDniDRLZjgDHZCg9Z3uRpSoqB J8LyGHqDaaSceidv5TqgMYUpcBevIUJAlbcO6xnvRxjojOathHSDSKxE5mb+I9n/zCmN feTZOFcPLVCTKpe1PiiogKAmgKHs20NVJ7goAkc19qUmBKhPF74jf8Vut6Cms64gS9Al 3Eqevng4cDF1TyQxbmrKdiRiRZwZmnBec+hf9aEVQTT6Nkh0ZgifJVN5owgtdh7XSVUQ xxVQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:from:references :cc:to:subject:ironport-sdr:ironport-sdr; bh=gaisGh1Mtk31qdzJw06z2lSTFRD63kxhhu/CDmEwhjY=; b=g2Vs8wM1Bq8hst3PxKZWMTZ4xYETv+h1/blA/iYp2duYd656zPXa7RewmpA5iaIB7P okz0mlqur9uMQVUZuhFlmqg+bjQCWz6rkDr3Op8bndbfWgmwIVQu/5MwJbjSjfHtxPuR yI9alq3ehAeSr7lkbY2vPLtYojaKso6E+h3qCzW7k8Z+7Z+ODLQepFlvnI1G5UIXM+zj oscg4b+1U26EfLp6hrcve29+8pDLXN6UVtZOh/QNEDiAmTRGTNuZwPKYUEKKZcJ9ViS0 bRqFN7oMDNc65hGgag18iqDNkmZoV12w7WSh7Y03Bkl6XUYeIBBkcygt0807BUZzlCJd cSuQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id y23si868954eju.563.2020.11.11.00.42.17; Wed, 11 Nov 2020 00:42:42 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1725945AbgKKIic (ORCPT + 99 others); Wed, 11 Nov 2020 03:38:32 -0500 Received: from mga03.intel.com ([134.134.136.65]:18321 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725849AbgKKIib (ORCPT ); Wed, 11 Nov 2020 03:38:31 -0500 IronPort-SDR: 2zOlFRLopzGhWZgnDcAziytsLbgqyoS2O6kXarqsicFE10rbyXJgGuVJ1f4PBLyCaAk/Z5ld5p c3mOEde7ua9Q== X-IronPort-AV: E=McAfee;i="6000,8403,9801"; a="170224919" X-IronPort-AV: E=Sophos;i="5.77,469,1596524400"; d="scan'208";a="170224919" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2020 00:38:27 -0800 IronPort-SDR: w0pS36efy4F3sOdG4Z8LixG5Wme63Q2ypKqeEJgq9rXQfIBTZZ/3RCv/7snMhbqi5nCjXDvNRN IhWBHPYX1skw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.77,469,1596524400"; d="scan'208";a="365860938" Received: from cli6-desk1.ccr.corp.intel.com (HELO [10.239.161.125]) ([10.239.161.125]) by FMSMGA003.fm.intel.com with ESMTP; 11 Nov 2020 00:38:23 -0800 Subject: Re: [RFC PATCH v3] sched/fair: select idle cpu from idle cpumask for task wakeup To: Valentin Schneider Cc: mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, tim.c.chen@linux.intel.com, linux-kernel@vger.kernel.org, Aubrey Li , Qais Yousef , Jiang Biao References: <20201021150335.1103231-1-aubrey.li@linux.intel.com> From: "Li, Aubrey" Message-ID: Date: Wed, 11 Nov 2020 16:38:22 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101 Thunderbird/68.9.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2020/11/9 23:54, Valentin Schneider wrote: > > On 09/11/20 13:40, Li, Aubrey wrote: >> On 2020/11/7 5:20, Valentin Schneider wrote: >>> >>> On 21/10/20 16:03, Aubrey Li wrote: >>>> From: Aubrey Li >>>> >>>> Added idle cpumask to track idle cpus in sched domain. When a CPU >>>> enters idle, its corresponding bit in the idle cpumask will be set, >>>> and when the CPU exits idle, its bit will be cleared. >>>> >>>> When a task wakes up to select an idle cpu, scanning idle cpumask >>>> has low cost than scanning all the cpus in last level cache domain, >>>> especially when the system is heavily loaded. >>>> >>> >>> FWIW I gave this a spin on my arm64 desktop (Ampere eMAG, 32 core). I get >>> some barely noticeable (AIUI not statistically significant for bench sched) >>> changes for 100 iterations of: >>> >>> | bench | metric | mean | std | q90 | q99 | >>> |------------------------------------+----------+--------+---------+--------+--------| >>> | hackbench --loops 5000 --groups 1 | duration | -1.07% | -2.23% | -0.88% | -0.25% | >>> | hackbench --loops 5000 --groups 2 | duration | -0.79% | +30.60% | -0.49% | -0.74% | >>> | hackbench --loops 5000 --groups 4 | duration | -0.54% | +6.99% | -0.21% | -0.12% | >>> | perf bench sched pipe -T -l 100000 | ops/sec | +1.05% | -2.80% | -0.17% | +0.39% | >>> >>> q90 & q99 being the 90th and 99th percentile. >>> >>> Base was tip/sched/core at: >>> d8fcb81f1acf ("sched/fair: Check for idle core in wake_affine") >> >> Thanks for the data, Valentin! So does the negative value mean improvement? >> > > For hackbench yes (shorter is better); for perf bench sched no, since the > metric here is ops/sec so higher is better. > > That said, I (use a tool that) run a 2-sample Kolmogorov–Smirnov test > against the two sample sets (tip/sched/core vs tip/sched/core+patch), and > the p-value for perf sched bench is quite high (~0.9) which means we can't > reject that both sample sets come from the same distribution; long story > short we can't say whether the patch had a noticeable impact for that > benchmark. > >> If so the data looks expected to me. As we set idle cpumask every time we >> enter idle, but only clear it at the tick frequency, so if the workload >> is not heavy enough, there could be a lot of idle during two ticks, so idle >> cpumask is almost equal to sched_domain_span(sd), which makes no difference. >> >> But if the system load is heavy enough, CPU has few/no chance to enter idle, >> then idle cpumask can be cleared during tick, which makes the bit number in >> sds_idle_cpus(sd->shared) far less than the bit number in sched_domain_span(sd) >> if llc domain has large count of CPUs. >> > > With hackbench -g 4 that's 160 tasks (against 32 CPUs, all under same LLC), > although the work done by each task isn't much. I'll try bumping that a > notch, or increasing the size of the messages. As long as the system is busy enough and not schedule on idle thread, then idle cpu mask will shrink tick by tick, and we'll see lower sd->avg_scan_cost. This version of patch sets idle cpu bit every time it enters idle, so need heavy load for scheduler to not switch idle thread in. I personally like the logic in the previous version, because in those versions, - when cpu enters idle, cpuidle governor returns a flag "stop_tick" - if tick is stopped, which indicates the CPU is not busy, and can be set idle in idle cpumask - otherwise, the CPU is likely going to work very soon, so not set it in idle cpumask. But apparently I missed "nohz=off" case in the previous implementation. For "nohz=off" case I selected to keep original behavior, which didn't content Mel. Probably I can refine it in the next version. Do you have any suggestions? Thanks, -Aubrey