Received: by 2002:a25:e74b:0:0:0:0:0 with SMTP id e72csp1729895ybh; Tue, 14 Jul 2020 06:02:48 -0700 (PDT) X-Google-Smtp-Source: ABdhPJy8FF8SGo/iuZeXP8ktWLek7ryBs+4EErZxf/sIs5sQDLx6ww7eJz5wcfNsykrUEEwiSVXo X-Received: by 2002:a05:6402:542:: with SMTP id i2mr4536130edx.318.1594731767425; Tue, 14 Jul 2020 06:02:47 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1594731767; cv=none; d=google.com; s=arc-20160816; b=f8tRfvQVCELe7ywB7kJuSjHCKtXiG7xxrvCtmTRANEWgW86NLNODZs7g5KCwu89jGp AI4l+bvPmz2ZyJMNNf7OMUmpN45mqrdnFxyZ54KSeskrQtSBGu0bwHUgWajXq41WquHf EFSoHBrTH1BnO8yD/qIaTpuli5j4IgWe9gwICY7D1WoOK9pzAjyEfUd7bGP8zVkrCffx 8Wlo/KPJTy977hQBjFbzllCr8RcvkDuIPVYUSW2MHwz8tpPQQAf8nk+CyZEm8dl4s3tL YMOCUea9oBcyA/8zgx8fGRil/DonGv3MifKZxOVNv9ifW/3wRMMkC6IBXaeiQ7lWR//Q zGzA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=UARzO1WFxZQcMjBB8yKfPoPlNkMAzXMUicHr6tVQpkQ=; b=UCnyCqhfBSVtKklLeGD7bg9X1UQZndyRfqHGYELnJI0e2riFJ7pM+asTeTNDcke7Ie ZdJg96fqZWqDpiVn5BIZU3i8eh84u6jGtI8PglCO/nZLxeN79uOGGGqVankr/yAWxcxB 85zejWKyb8v/m7Pg/h+C3yCHXupjmOUKcQbMojVUBGoWzG407JslwwPi95Xy+ymlSdhx kLRmjiixGefc3PtKI7YD1WQDigCc2uqRZXqffghmNXkQFXgaa3O1I/ql85d3tQ3TR28X S7rzAJrYRr3QDfuzrn0xXRaq5a8crzW0mtj01q6gcDF6A+Y0mSkuOkXXDQfDH4fO41rm 0+VA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=QPiADLSq; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id g19si11745174edv.174.2020.07.14.06.02.23; Tue, 14 Jul 2020 06:02:47 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=QPiADLSq; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726955AbgGNM7p (ORCPT + 99 others); Tue, 14 Jul 2020 08:59:45 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40164 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726354AbgGNM7o (ORCPT ); Tue, 14 Jul 2020 08:59:44 -0400 Received: from mail-qt1-x842.google.com (mail-qt1-x842.google.com [IPv6:2607:f8b0:4864:20::842]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9A59FC061755 for ; Tue, 14 Jul 2020 05:59:44 -0700 (PDT) Received: by mail-qt1-x842.google.com with SMTP id o38so12642362qtf.6 for ; Tue, 14 Jul 2020 05:59:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=UARzO1WFxZQcMjBB8yKfPoPlNkMAzXMUicHr6tVQpkQ=; b=QPiADLSq7fELwQRpXhcH/HkF/SzqCS7Dxdc3RMfwu+dOJOjE5PPFKYkZD/QX40fS0F 7SB6TdT8j1TAwUtSvYR3AAd/xrM+DWFqNpy60fem81JA7mMpr0dQ5mA4AvxQ9z8CcEMx kgIf3B4h0nYBjjPJcWUHzjyoVgGjdAjE/xInulticD7yy1gwqaq3UHMC+uBitkyQN9gE EDlwADqCFqrkF3fJYBmGdyv5VTsBU5RV2aw9knkV7O6vCfiZqSJYBpDa4n1XK2wzO/+u G+joeYEPjKZjTKLkjhc2PJZS/4FpdtxwZsWEFfCQuPnCekqranbRnflXIMUOrfAdVMkt kqVg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=UARzO1WFxZQcMjBB8yKfPoPlNkMAzXMUicHr6tVQpkQ=; b=Y3htpxKdCyheY7f7KG3ooyyR1/O/qFNMzB/LsWEY3PYrHAJLRxIxy/wbOZO1957Ixu xknqvugrV3p60BFynd51ItS7DSyc9cvf1T1dlK1EmVWIebTCBhzQqqTANoqZJ9NcBx/n mi4Xska2X1bs/d0U+FfUySraGYr/JVkLvKrWeG5e7Xk606lmzOpk35J/eYOqvYNkJ2oR ntP8FYvnKg235hhYzhxi0rJjHAygSnOAGYRMlvmTo1GJFwJWpsC7Ba8MLfUL3sG1kZHS Q71mbfNhFLu85c/NAKXU/tm24TFjnH/f5d34AjwxOzg9PNkTmBKbm3BdWQFFgoQKeYGo PmOQ== X-Gm-Message-State: AOAM5332HY/s1vls5Vpk7L19nCPFyPdi/VsGyONCa9p7ApCboYNWOBjf gS0qrNe/cRdBUW4eGHdV0XXKlLNXBWtauvci X-Received: by 2002:ac8:4b4f:: with SMTP id e15mr4588639qts.305.1594731583374; Tue, 14 Jul 2020 05:59:43 -0700 (PDT) Received: from Trinity.hsd1.ma.comcast.net ([2601:189:8200:19b0:8007:e577:f459:885a]) by smtp.gmail.com with ESMTPSA id f4sm22582162qtv.59.2020.07.14.05.59.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 14 Jul 2020 05:59:42 -0700 (PDT) From: peter.puhov@linaro.org To: linux-kernel@vger.kernel.org Cc: peter.puhov@linaro.org, robert.foley@linaro.org, Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman Subject: [PATCH v1] sched/fair: update_pick_idlest() Select group with lowest group_util when idle_cpus are equal Date: Tue, 14 Jul 2020 08:59:41 -0400 Message-Id: <20200714125941.4174-1-peter.puhov@linaro.org> X-Mailer: git-send-email 2.20.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Peter Puhov v0: https://lkml.org/lkml/2020/6/16/1286 Changes in v1: - Test results formatted in a table form as suggested by Valentin Schneider - Added explanation by Vincent Guittot why nr_running may not be sufficient In slow path, when selecting idlest group, if both groups have type group_has_spare, only idle_cpus count gets compared. As a result, if multiple tasks are created in a tight loop, and go back to sleep immediately (while waiting for all tasks to be created), they may be scheduled on the same core, because CPU is back to idle when the new fork happen. For example: sudo perf record -e sched:sched_wakeup_new -- \ sysbench threads --threads=4 run ... total number of events: 61582 ... sudo perf script sysbench 129378 [006] 74586.633466: sched:sched_wakeup_new: sysbench:129380 [120] success=1 CPU:007 sysbench 129378 [006] 74586.634718: sched:sched_wakeup_new: sysbench:129381 [120] success=1 CPU:007 sysbench 129378 [006] 74586.635957: sched:sched_wakeup_new: sysbench:129382 [120] success=1 CPU:007 sysbench 129378 [006] 74586.637183: sched:sched_wakeup_new: sysbench:129383 [120] success=1 CPU:007 This may have negative impact on performance for workloads with frequent creation of multiple threads. In this patch we are using group_util to select idlest group if both groups have equal number of idle_cpus. Comparing the number of idle cpu is not enough in this case, because the newly forked thread sleeps immediately and before we select the cpu for the next one. This is shown in the trace where the same CPU7 is selected for all wakeup_new events. That's why, looking at utilization when there is the same number of CPU is a good way to see where the previous task was placed. Using nr_running doesn't solve the problem because the newly forked task is not running and the cpu would not have been idle in this case and an idle CPU would have been selected instead. With this patch newly created tasks would be better distributed. With this patch: sudo perf record -e sched:sched_wakeup_new -- \ sysbench threads --threads=4 run ... total number of events: 74401 ... sudo perf script sysbench 129455 [006] 75232.853257: sched:sched_wakeup_new: sysbench:129457 [120] success=1 CPU:008 sysbench 129455 [006] 75232.854489: sched:sched_wakeup_new: sysbench:129458 [120] success=1 CPU:009 sysbench 129455 [006] 75232.855732: sched:sched_wakeup_new: sysbench:129459 [120] success=1 CPU:010 sysbench 129455 [006] 75232.856980: sched:sched_wakeup_new: sysbench:129460 [120] success=1 CPU:011 We tested this patch with following benchmarks: master: 'commit b3a9e3b9622a ("Linux 5.8-rc1")' 100 iterations of: perf bench -f simple futex wake -s -t 128 -w 1 Lower result is better | | BASELINE | +PATCH | DELTA (%) | |---------|------------|----------|-------------| | mean | 0.33 | 0.313 | +5.152 | | std (%) | 10.433 | 7.563 | | 100 iterations of: sysbench threads --threads=8 run Higher result is better | | BASELINE | +PATCH | DELTA (%) | |---------|------------|----------|-------------| | mean | 5235.02 | 5863.73 | +12.01 | | std (%) | 8.166 | 10.265 | | 100 iterations of: sysbench mutex --mutex-num=1 --threads=8 run Lower result is better | | BASELINE | +PATCH | DELTA (%) | |---------|------------|----------|-------------| | mean | 0.413 | 0.404 | +2.179 | | std (%) | 3.791 | 1.816 | | Signed-off-by: Peter Puhov --- kernel/sched/fair.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 02f323b85b6d..abcbdf80ee75 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -8662,8 +8662,14 @@ static bool update_pick_idlest(struct sched_group *idlest, case group_has_spare: /* Select group with most idle CPUs */ - if (idlest_sgs->idle_cpus >= sgs->idle_cpus) + if (idlest_sgs->idle_cpus > sgs->idle_cpus) return false; + + /* Select group with lowest group_util */ + if (idlest_sgs->idle_cpus == sgs->idle_cpus && + idlest_sgs->group_util <= sgs->group_util) + return false; + break; } -- 2.20.1