Received: by 2002:a05:6358:d09b:b0:dc:cd0c:909e with SMTP id jc27csp2241342rwb; Thu, 15 Dec 2022 22:29:42 -0800 (PST) X-Google-Smtp-Source: AA0mqf5CClsBmMcGXVq14HuBsrZItpYml6GBkWb8tm0cVLx4KYaxmTl8N4M35Zl+ErvsvOc2CdGY X-Received: by 2002:aa7:cc0b:0:b0:461:8a19:414f with SMTP id q11-20020aa7cc0b000000b004618a19414fmr26232770edt.36.1671172181969; Thu, 15 Dec 2022 22:29:41 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1671172181; cv=none; d=google.com; s=arc-20160816; b=XgIhoMuqXDHHSR3iLspAtqddSeiFXJz/Xo5Sg03FGlLZtSRdGhV0pMRa3YjSjbsUVR 1dubuenNXkNVIh2BMv9X8DiCAPXhWOHnE+4U6Q+BukTcXeyKdIefv1ETiXn/m+Ok4XjA rahbsiqld0XKfP2tlLwhcLuqY50SkZCO/APBrfM4BINjcGG76HcGiMZYbtsgsyQWi0pm 0vqHYpjoFGpizJC9qNdjuJF4B5UtdUPjx9/fl/4e0O1vspPxSeFDfNV3AUGI3s8qvOjo kRo+n0p+REEE2PHY/FtGVCVF7AjwDXFt1a1/lceUjVCJp92uThY9A0vO7f7jtdNPdWTL 9Egw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=znuWhA5ZwmA58xRpew373wlwF2INaf8YI63r4Mfx2Oc=; b=c4F7BZmBDQoss3rrdrztbSoH7viPjPYY4Ozf91iviymat/m/Q2Bd3loARwOGbNVrUs xRRxewUQYxT3KwMKthCJjQGV6Z0yV5/hHLtOv9HHZGDqzfW54HEJeYsq2IDCJYxt8DK6 fwUWzTLIsCmYmmhej2Qg906GqUuFuom+o+YdYQVKG2XG71Iksh4fVodMas46LEp3zvvU aoaJhDAWtD4c1CWKTzAOP0aqqSDVq16ECYnj5k9ONoTaL/vkwC1NQgMUHoy8bZryebZ5 8l0lxVCcJwyJcJNf9jJWqGUxFsihPZQ0DnHO1oKgchcxaJYpBX54nS+lZqW9pRYSSepX qOXQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=NA0XHdub; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id d18-20020a05640208d200b0046b76553f5esi1314888edz.329.2022.12.15.22.29.25; Thu, 15 Dec 2022 22:29:41 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=NA0XHdub; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229675AbiLPGIc (ORCPT + 69 others); Fri, 16 Dec 2022 01:08:32 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44014 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229655AbiLPGIa (ORCPT ); Fri, 16 Dec 2022 01:08:30 -0500 Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D18B32EF5B for ; Thu, 15 Dec 2022 22:08:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1671170908; x=1702706908; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=LDypGX5BWlrJDf+sI3pcisid8OoMmHVrixjsj3OKuw0=; b=NA0XHdubFHv7dlh8vGGElBP5Ott0xcAH6qeidqBnLzl+6W7gM8Mhw27R 810qQmJoyGdUHGI/WSRySnAUEsFMAskodzhH25jR6ekq9oVXHPIdQ01P2 SIxoJI4W/YoKF/ndguo6qtcsgaMZEm5ikH15XfczMq3TDfl7TCfbdog7Z mEBbnrPV6+2Vn4TclQS/Re127Q0iSsFIlbvF4ZdLU5BGm5eu/KGe13nKJ deXGc5Gz1CMXR3Mlo+SvmXsUN5oyKClvPiIyqaPdpdiFvu9z1t/LHv1ES xOTPT+bUf+1dz+pnbEn6ek4WPNHkipDtJgAQ6BDUx6wZboyYMufmYcLGv w==; X-IronPort-AV: E=McAfee;i="6500,9779,10562"; a="320068894" X-IronPort-AV: E=Sophos;i="5.96,249,1665471600"; d="scan'208";a="320068894" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Dec 2022 22:08:28 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10562"; a="680389248" X-IronPort-AV: E=Sophos;i="5.96,249,1665471600"; d="scan'208";a="680389248" Received: from chenyu-dev.sh.intel.com ([10.239.158.170]) by orsmga008.jf.intel.com with ESMTP; 15 Dec 2022 22:08:21 -0800 From: Chen Yu To: Peter Zijlstra , Vincent Guittot , Tim Chen , Mel Gorman Cc: Juri Lelli , Rik van Riel , Aaron Lu , Abel Wu , K Prateek Nayak , Yicong Yang , "Gautham R . Shenoy" , Ingo Molnar , Dietmar Eggemann , Steven Rostedt , Ben Segall , Daniel Bristot de Oliveira , Valentin Schneider , Hillf Danton , Honglei Wang , Len Brown , Chen Yu , Tianchen Ding , Joel Fernandes , Josh Don , linux-kernel@vger.kernel.org, Chen Yu Subject: [RFC PATCH v4 0/2] sched/fair: Choose the CPU where short task is running during wake up Date: Fri, 16 Dec 2022 14:08:50 +0800 Message-Id: X-Mailer: git-send-email 2.25.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org The main purpose of this change is to avoid too many cross CPU wake up when it is unnecessary. The frequent cross CPU wake up brings significant damage to some workloads, especially on high core count systems. This patch set inhibits the cross CPU wake-up by placing the wakee on waking CPU or previous CPU, if both the waker and wakee are short-duration tasks. The first patch is to introduce the definition of a short-duration task. The second patch leverages the first patch to choose a local or previous CPU for wakee. Changes since v3: 1. Honglei and Josh have concern that the threshold of short task duration could be too long. Decreased the threshold from sysctl_sched_min_granularity to (sysctl_sched_min_granularity / 8), and the '8' comes from get_update_sysctl_factor(). 2. Export p->se.dur_avg to /proc/{pid}/sched per Yicong's suggestion. 3. Move the calculation of average duration from put_prev_task_fair() to dequeue_task_fair(). Because there is an issue in v3 that, put_prev_task_fair() will not be invoked by pick_next_task_fair() in fast path, thus the dur_avg could not be updated timely. 4. Fix the comment in PATCH 2/2, that "WRITE_ONCE(CPU1->ttwu_pending, 1);" on CPU0 is earlier than CPU1 getting "ttwu_list->p0", per Tianchen. 5. Move the scan for CPU with short duration task from select_idle_cpu() to select_idle_siblings(), because there is no CPU scan involved, per Yicong. Changes since v2: 1. Peter suggested comparing the duration of waker and the cost to scan for an idle CPU: If the cost is higher than the task duration, do not waste time finding an idle CPU, choose the local or previous CPU directly. A prototype was created based on this suggestion. However, according to the test result, this prototype does not inhibit the cross CPU wakeup and did not bring improvement. Because the cost to find an idle CPU is small in the problematic scenario. The root cause of the problem is a race condition between scanning for an idle CPU and task enqueue(please refer to the commit log in PATCH 2/2). So v3 does not change the core logic of v2, with some refinement based on Peter's suggestion. 2. Simplify the logic to record the task duration per Peter and Abel's suggestion. This change brings overall improvement on some microbenchmarks, both on Intel and AMD platforms. v3: https://lore.kernel.org/lkml/cover.1669862147.git.yu.c.chen@intel.com/ v2: https://lore.kernel.org/all/cover.1666531576.git.yu.c.chen@intel.com/ v1: https://lore.kernel.org/lkml/20220915165407.1776363-1-yu.c.chen@intel.com/ Chen Yu (2): sched/fair: Introduce short duration task check sched/fair: Choose the CPU where short task is running during wake up include/linux/sched.h | 3 +++ kernel/sched/core.c | 2 ++ kernel/sched/debug.c | 1 + kernel/sched/fair.c | 32 ++++++++++++++++++++++++++++++++ kernel/sched/features.h | 1 + 5 files changed, 39 insertions(+) -- 2.25.1