Received: by 2002:a05:6a10:16a7:0:0:0:0 with SMTP id gp39csp3308892pxb; Mon, 9 Nov 2020 07:56:56 -0800 (PST) X-Google-Smtp-Source: ABdhPJwu6exy7jCddzvXQuIBcVt79vFu16r0z6VwfpQgINAYnBRU3gps4qzjJD0urcOToQf+4coy X-Received: by 2002:a17:906:6d4:: with SMTP id v20mr16551568ejb.500.1604937415759; Mon, 09 Nov 2020 07:56:55 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1604937415; cv=none; d=google.com; s=arc-20160816; b=esoz1haFGSRVCFU3l3XbMFVGw3zqmrtuX+jW6uwDSPSHjCFt1o/lyfKFtkxVey9Q7/ 4gbOpYPMQIml13AbjU2ydCvcGwPHiI5WhsDBG+hZXerlJRHZFErrKIiAPyNFfE6qpCo5 PeF/g7eGzn9GWJyCQlEb3GtSnGPZ60K58/TdgmFiIJL5WUlyXEJeeiU5Xv5CYWTEhHPB tydZ3rI+Byprq8qJBwJInsQ1vigezhNYT7kiHBJ3SA9VYbKu6kDyaSVwAsgw3QM1SlPc 82c7DbM1i+ywBRvDgx9EyPYlJnLL5o2SDQGyWA5dj5VBD5RUK/hk2EUMiN7xOpuHWyOS NOFA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:in-reply-to:subject:cc:to:from:user-agent :references; bh=/PS6IqxWlupCwAJoRn5I34Vd/eAVxXVHA2oDUx7keGo=; b=NogKpZJKDwzVglkKoRryXucgq9LVRUv5zZgN5H3VMCBVi0crc9rElWSdzCHEr5FheP IIzOGNfvZ3XfnPIyrfiNnyZW1BC0+zQ/50IQuCsOgiYoSE4ShhFzR0D5mOm4JODGKnMc E6g/MK+ggfbUek8a6KrToagWSFG+ZTbxvUiAqPffkYYCoI/8njNUSeJvcqrks0rU2vWA xtKKkwxZPYUS98CePfiOowZMFwDcxL2O/sfeSW4LdrR3jn2ym4c93xScXLygiKEcuhJy +KYA2r8gaJA7kJkQVIJk0x2cz+NTygcv55QQtuEauf2ijy4FpjJyx+oHQ3WbbJLrylY5 DgBQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id d16si7300177edx.296.2020.11.09.07.56.32; Mon, 09 Nov 2020 07:56:55 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729921AbgKIPys convert rfc822-to-8bit (ORCPT + 99 others); Mon, 9 Nov 2020 10:54:48 -0500 Received: from foss.arm.com ([217.140.110.172]:42118 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726410AbgKIPyr (ORCPT ); Mon, 9 Nov 2020 10:54:47 -0500 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id D87BB1042; Mon, 9 Nov 2020 07:54:45 -0800 (PST) Received: from e113632-lin (e113632-lin.cambridge.arm.com [10.1.194.46]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id E7A633F719; Mon, 9 Nov 2020 07:54:43 -0800 (PST) References: <20201021150335.1103231-1-aubrey.li@linux.intel.com> User-agent: mu4e 0.9.17; emacs 26.3 From: Valentin Schneider To: "Li\, Aubrey" Cc: mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, tim.c.chen@linux.intel.com, linux-kernel@vger.kernel.org, Aubrey Li , Qais Yousef , Jiang Biao Subject: Re: [RFC PATCH v3] sched/fair: select idle cpu from idle cpumask for task wakeup In-reply-to: Date: Mon, 09 Nov 2020 15:54:36 +0000 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8BIT Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 09/11/20 13:40, Li, Aubrey wrote: > On 2020/11/7 5:20, Valentin Schneider wrote: >> >> On 21/10/20 16:03, Aubrey Li wrote: >>> From: Aubrey Li >>> >>> Added idle cpumask to track idle cpus in sched domain. When a CPU >>> enters idle, its corresponding bit in the idle cpumask will be set, >>> and when the CPU exits idle, its bit will be cleared. >>> >>> When a task wakes up to select an idle cpu, scanning idle cpumask >>> has low cost than scanning all the cpus in last level cache domain, >>> especially when the system is heavily loaded. >>> >> >> FWIW I gave this a spin on my arm64 desktop (Ampere eMAG, 32 core). I get >> some barely noticeable (AIUI not statistically significant for bench sched) >> changes for 100 iterations of: >> >> | bench | metric | mean | std | q90 | q99 | >> |------------------------------------+----------+--------+---------+--------+--------| >> | hackbench --loops 5000 --groups 1 | duration | -1.07% | -2.23% | -0.88% | -0.25% | >> | hackbench --loops 5000 --groups 2 | duration | -0.79% | +30.60% | -0.49% | -0.74% | >> | hackbench --loops 5000 --groups 4 | duration | -0.54% | +6.99% | -0.21% | -0.12% | >> | perf bench sched pipe -T -l 100000 | ops/sec | +1.05% | -2.80% | -0.17% | +0.39% | >> >> q90 & q99 being the 90th and 99th percentile. >> >> Base was tip/sched/core at: >> d8fcb81f1acf ("sched/fair: Check for idle core in wake_affine") > > Thanks for the data, Valentin! So does the negative value mean improvement? > For hackbench yes (shorter is better); for perf bench sched no, since the metric here is ops/sec so higher is better. That said, I (use a tool that) run a 2-sample Kolmogorov–Smirnov test against the two sample sets (tip/sched/core vs tip/sched/core+patch), and the p-value for perf sched bench is quite high (~0.9) which means we can't reject that both sample sets come from the same distribution; long story short we can't say whether the patch had a noticeable impact for that benchmark. > If so the data looks expected to me. As we set idle cpumask every time we > enter idle, but only clear it at the tick frequency, so if the workload > is not heavy enough, there could be a lot of idle during two ticks, so idle > cpumask is almost equal to sched_domain_span(sd), which makes no difference. > > But if the system load is heavy enough, CPU has few/no chance to enter idle, > then idle cpumask can be cleared during tick, which makes the bit number in > sds_idle_cpus(sd->shared) far less than the bit number in sched_domain_span(sd) > if llc domain has large count of CPUs. > With hackbench -g 4 that's 160 tasks (against 32 CPUs, all under same LLC), although the work done by each task isn't much. I'll try bumping that a notch, or increasing the size of the messages. > For example, if I run 4 x overcommit uperf on a system with 192 CPUs, > I observed: > - default, the average of this_sd->avg_scan_cost is 223.12ns > - patch, the average of this_sd->avg_scan_cost is 63.4ns > > And select_idle_cpu is called 7670253 times per second, so for every CPU the > scan cost is saved (223.12 - 63.4) * 7670253 / 192 = 6.4ms. As a result, I > saw uperf thoughput improved by 60+%. > That's ~1.2s of "extra" CPU time per second, which sounds pretty cool. I don't think I've ever played with uperf. I'll give that a shot someday. > Thanks, > -Aubrey > > >