Received: by 2002:a05:6902:102b:0:0:0:0 with SMTP id x11csp815413ybt; Wed, 24 Jun 2020 11:53:59 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyZ7KshHynot7I0/LbrrEeyVfeyIXHqK62olQKRi05g9JaiqdFXzjcN57SO6QkmNyLgv8ud X-Received: by 2002:a17:906:2b81:: with SMTP id m1mr25488900ejg.488.1593024839482; Wed, 24 Jun 2020 11:53:59 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1593024839; cv=none; d=google.com; s=arc-20160816; b=RBd28+Y66XbXBZd1l7m7qnmB7RF4OPtzIkHiX5NTgq3Gw3fSTHma473VBSHTQVqHOi Sj6m7xGPtrwo1SZCm6GOLtquSjvINb1u9kG4XPwCyEH7TL8Rc0SpI6pAwhAFswMbuvMN GJLGllthih7WkdRSEIihAycVSRK37nXftDniIwyXEYOd557uNiO3RXSU5izByhdv0uOD heA5HOr0B66NoLefOYb03cTvZe/k//vvbCUWX13ArVXlJtf0axTQTKOE3E45P8UMmRnt JyL+3QwdFq3/Ss7eQpbj+z8sH5Mft3+8MwxmJ3qR7dvyocQcnCvLyiAOIMdtg4jLxIDZ /YjA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=BSBHRDFdCQBaKLYhdcxFUXJwGlQwy6/ypVUCgPoA93I=; b=nKvRTXKmUXJx9Vv1jPcbyH8la5/6MN6/wEhTj5OV61T2WV8rgjXKmv2BB/7W4LxO89 o9Z3vRP20p6F8Aji6yTEZSblYT6mlq3f7XujdTfoy/aTY/WC2TYngGj6fb3IARDr35bC WWmaNGps+vDLfEhKrUBBMVdwRyGpRvyt5aOsE4CjjWs0FSlVirNguDHMvLnZeAZrn14O DNorRte4eMnmQNzFdwZ+ScrM9xyWNdMe7PbjuDp8PLOzsY6VUl4gUJoSXZCzjH5bxIJi TalVQLgw+DmyrvgVgMvGBHNrzeY4OCynAo4X/FnpmMsmoOOcHyFxWmRuv8nRnL8xviT9 j1LA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@posk.io header.s=google header.b=IVC+YmN+; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id dm27si12352082edb.308.2020.06.24.11.53.36; Wed, 24 Jun 2020 11:53:59 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@posk.io header.s=google header.b=IVC+YmN+; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2406153AbgFXSxU (ORCPT + 99 others); Wed, 24 Jun 2020 14:53:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35172 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2406100AbgFXSxQ (ORCPT ); Wed, 24 Jun 2020 14:53:16 -0400 Received: from mail-pg1-x542.google.com (mail-pg1-x542.google.com [IPv6:2607:f8b0:4864:20::542]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E0B5CC0613ED for ; Wed, 24 Jun 2020 11:53:15 -0700 (PDT) Received: by mail-pg1-x542.google.com with SMTP id u128so1818057pgu.13 for ; Wed, 24 Jun 2020 11:53:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=posk.io; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=BSBHRDFdCQBaKLYhdcxFUXJwGlQwy6/ypVUCgPoA93I=; b=IVC+YmN+SpA850yWceAg+Sfvabr8BLl2GroqtH/bHvP6VcOth5oTHm6xPQx2+yWYBx 3fOTNpinvG1rqtJk89Xx7gwUXOBLhVpF2avlpr/TuuiebHLA1vvXROg4DY/2TuDhCCuN eDYa4q1B/ddZcUwqqAjAQiPWz/G/XAxn3NRCvD+oBUSalBE7qsiNy0oMLicRj4IbLfJO ayhT4gLPkcEss+bDBs619rnvPmWU5UlXm5Iu5QIDpy1vxBVRIROLypKFRLaaQWuWJnnC kcdatfwib5jHYopQeujAGqiW6d0imUXxznyp2bXSfglKuK6ZOjOZfg7CquI6/9Cwo+Me yPAQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=BSBHRDFdCQBaKLYhdcxFUXJwGlQwy6/ypVUCgPoA93I=; b=Y+wbOtUY0AlnM4W1zykVUZylam1PAE+e9Ojr1LbgIriz035STZts6cBPWcuAK3qItw bhkBJ3ac2dpSgeJZY+pD2OTEWR+IXsYdxvf0wJy2mi3nsnOJl7XSSPRqMsyjWdVust7l 7P6TMDeCcYHdZ2RNS6bJwzngzOR+LT5vugrIFmiY3f9QhasxsNpDE9HddKQiT7PF8HMB bKDSpiwU++qbRa0/l+iQXdgZhKNdCTuhAk0CVOa6Edy/c9v1vBXKzp0C4O8spLkSv1D+ fUopXHqAzMn1pwJfBIbFebP0GLuteTgxYvn4inKKeFbgP1BOmmvmAO+6MwaQtx8uNc/k DwnQ== X-Gm-Message-State: AOAM532wwohh7lg9sBz6r9Eik3MpWerArPyNnmBWSji7s8ZYXY2j8eL5 wcyfNLVwysmqQsqzot5dlyIGQ6iXDTO7V/hD X-Received: by 2002:a63:ec05:: with SMTP id j5mr23545205pgh.109.1593024795172; Wed, 24 Jun 2020 11:53:15 -0700 (PDT) Received: from posk-x1c.lan (c-73-202-129-89.hsd1.ca.comcast.net. [73.202.129.89]) by smtp.gmail.com with ESMTPSA id q10sm22295815pfk.86.2020.06.24.11.53.13 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 24 Jun 2020 11:53:14 -0700 (PDT) From: Peter Oskolkov To: Linux Kernel Mailing List , Thomas Gleixner , Ingo Molnar , Peter Zijlstra , Darren Hart , Vincent Guittot Cc: Peter Oskolkov , Andrei Vagin , Paul Turner , Ben Segall , Aaron Lu , kernel test robot Subject: [RFC PATCH 2/3 v3] futex/sched: add wake_up_process_prefer_current_cpu, use in FUTEX_SWAP Date: Wed, 24 Jun 2020 11:52:46 -0700 Message-Id: <20200624185247.13269-3-posk@posk.io> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200624185247.13269-1-posk@posk.io> References: <20200624185247.13269-1-posk@posk.io> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Peter Oskolkov This is an RFC! v2: fix undefined symbol error ifndef CONFIG_SMP, as Reported-by: kernel test robot As described in the previous patch in this patchset ("futex: introduce FUTEX_SWAP operation"), it is often beneficial to wake a task and run it on the same CPU where the current going to sleep task it running. Internally at Google, switchto_switch sycall not only migrates the wakee to the current CPU, but also moves the waker's load stats to the wakee, thus ensuring that the migration to the current CPU does not interfere with load balancing. switchto_switch also does the context switch into the wakee, bypassing schedule(). This patchset does not go that far yet, it simply migrates the wakee to the current CPU and calls schedule(). This can potentially interfere with load balancing but, on the other hand, the code is concise and simple and is easy to understand to start the discussion. If this approach is OK, then in follow-up patches I will try to fune-tune the behavior by adjusting load stats and schedule(): our internal switchto_switch is still about 2x faster than FUTEX_SWAP (see numbers below). And now about performance: futex_swap benchmark from the last patch in this patchset produces this typical output: $ ./futex_swap -i 100000 ------- running SWAP_WAKE_WAIT ----------- completed 100000 swap and back iterations in 820683263 ns: 4103 ns per swap PASS ------- running SWAP_SWAP ----------- completed 100000 swap and back iterations in 124034476 ns: 620 ns per swap PASS In the above, the first benchmark (SWAP_WAKE_WAIT) calls FUTEX_WAKE, then FUTEX_WAIT; the second benchmark (SWAP_SWAP) calls FUTEX_SWAP. If the benchmark is restricted to a single cpu: $ taskset -c 1 ./futex_swap -i 1000000 The numbers are very similar, as expected (with wake+wait being a bit slower than swap due to two vs one syscalls). Please also note that switchto_switch is about 2x faster than FUTEX_SWAP because it does a contex switch to the wakee immediately, bypassing schedule(), so this is one of the options I'll explore in further patches (if/when this initial patchset is accepted). Tested: see the last patch is this patchset. Signed-off-by: Peter Oskolkov --- include/linux/sched.h | 1 + kernel/futex.c | 9 ++++----- kernel/sched/core.c | 5 +++++ kernel/sched/fair.c | 3 +++ kernel/sched/sched.h | 1 + 5 files changed, 14 insertions(+), 5 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index b62e6aaf28f0..edda0ace89a3 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1719,6 +1719,7 @@ extern struct task_struct *find_get_task_by_vpid(pid_t nr); extern int wake_up_state(struct task_struct *tsk, unsigned int state); extern int wake_up_process(struct task_struct *tsk); +extern int wake_up_process_prefer_current_cpu(struct task_struct *tsk); extern void wake_up_new_task(struct task_struct *tsk); #ifdef CONFIG_SMP diff --git a/kernel/futex.c b/kernel/futex.c index 670d6d113561..95852d7e6036 100644 --- a/kernel/futex.c +++ b/kernel/futex.c @@ -2649,12 +2649,11 @@ static void futex_wait_queue_me(struct futex_hash_bucket *hb, struct futex_q *q, */ if (!timeout || timeout->task) { if (next) { - /* - * wake_up_process() below will be replaced - * in the next patch with - * wake_up_process_prefer_current_cpu(). - */ +#ifdef CONFIG_SMP + wake_up_process_prefer_current_cpu(next); +#else wake_up_process(next); +#endif put_task_struct(next); next = NULL; } diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 8f360326861e..3dcb02f79328 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -6253,6 +6253,11 @@ void sched_setnuma(struct task_struct *p, int nid) } #endif /* CONFIG_NUMA_BALANCING */ +int wake_up_process_prefer_current_cpu(struct task_struct *next) +{ + return try_to_wake_up(next, TASK_NORMAL, WF_CURRENT_CPU); +} + #ifdef CONFIG_HOTPLUG_CPU /* * Ensure that the idle task is using init_mm right before its CPU goes diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index cbcb2f71599b..8c958f430af7 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -6664,6 +6664,9 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int sd_flag, int wake_f int want_affine = 0; int sync = (wake_flags & WF_SYNC) && !(current->flags & PF_EXITING); + if ((wake_flags & WF_CURRENT_CPU) && cpumask_test_cpu(cpu, p->cpus_ptr)) + return cpu; + if (sd_flag & SD_BALANCE_WAKE) { record_wakee(p); diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 1d4e94c1e5fe..d90e7d2c388f 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -1683,6 +1683,7 @@ static inline int task_on_rq_migrating(struct task_struct *p) #define WF_FORK 0x02 /* Child wakeup after fork */ #define WF_MIGRATED 0x04 /* Internal use, task got migrated */ #define WF_ON_RQ 0x08 /* Wakee is on_rq */ +#define WF_CURRENT_CPU 0x10 /* Prefer to move wakee to the current CPU */ /* * To aid in avoiding the subversion of "niceness" due to uneven distribution -- 2.25.1