Received: by 2002:a05:6358:a55:b0:ec:fcf4:3ecf with SMTP id 21csp1554387rwb; Fri, 13 Jan 2023 14:05:14 -0800 (PST) X-Google-Smtp-Source: AMrXdXtNuwSW9BoIzfowPO1HPdeGyt5QFUix0gW/KutaoqapKQZq3+y6E0h+HKckF5vS67BBtJ13 X-Received: by 2002:a17:907:8b90:b0:84d:207d:c00c with SMTP id tb16-20020a1709078b9000b0084d207dc00cmr28099559ejc.25.1673647514324; Fri, 13 Jan 2023 14:05:14 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1673647514; cv=none; d=google.com; s=arc-20160816; b=Ub9GQu8rYnWhwGdfoLhA6UmhbZdLzK5ZnvhB18kS4g374p5uuz5nuW+sq/+GMQa7FE qCn3aseiHyGW/DwrpWRtgj+Ozx0xmN6rCD/Tq7FWXGrcLOFbc40tvqgHz9CnFKsvvDX9 DNEz6GGjvO0rqc7SqRvRhawQ1cTNFQrz0cM8wO+6Y3LRVCh08b/L1sw8Zh4kOCjeeK9Z e7gX+vL+zEnHwy7DWpsfeaOrAuDRJFNcataBn+M7pLHpJNLfzJv2OVRTKjKskrEBCnjC pThnQz3z5cuzu+N7smcF3p3Zzl4RW21NGqJPiS5gWfph8US+2dc6m5Pjrm6tlrEGmcbA YOkA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=yzardt2FwxVZJ2DLRLd08YoMu52l4Jh0UvvdOlCjplY=; b=bQZ3tnJ8nqd1PE2SdBWmPsBlDFIVUlzaSNAz9GV61EoZI7GxmRxH2DHBwd6hEoWr6z 1gDFltJgvTiSVSUODHMH6AkSsnukWcFSPSb8N+CfX0CcDRdetYgq7/w9LWz11f9VS8ZT a6KfwPyAJtuuRyxrkBOIO2im6cBFtuUxChOBanG/nPQxnSymzzbRtW0fBs1x7by1BI92 /OgLMJb7nuAiCUbEPVIvvi7OcIor5BBrIJUcjXliAObm1xvzkHyelvGCh2/0J1Dq7CKa 5xjw0C/R+y/KyGozy/yUMO0YQygww9Cz9w32kxlirSMDxYe0Ho5NVdAurNBfLZgSEq1l k2Ww== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b="iQhKtL//"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id xd9-20020a170907078900b0084cdfa26c19si20916061ejb.841.2023.01.13.14.05.00; Fri, 13 Jan 2023 14:05:14 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b="iQhKtL//"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231177AbjAMVkB (ORCPT + 54 others); Fri, 13 Jan 2023 16:40:01 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40004 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229904AbjAMVkA (ORCPT ); Fri, 13 Jan 2023 16:40:00 -0500 Received: from mail-yw1-x1129.google.com (mail-yw1-x1129.google.com [IPv6:2607:f8b0:4864:20::1129]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DF6D4B851 for ; Fri, 13 Jan 2023 13:39:58 -0800 (PST) Received: by mail-yw1-x1129.google.com with SMTP id 00721157ae682-4db05a4db9bso80162137b3.10 for ; Fri, 13 Jan 2023 13:39:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=yzardt2FwxVZJ2DLRLd08YoMu52l4Jh0UvvdOlCjplY=; b=iQhKtL//flevk+6/GtQbGu8Xu7LsQCBFmA0aDjZHOnt61jKeEU9gB0YLspV/MazHxT KrHu4enP10I2nUWw1hgYajR6OToC1jmU8vCOEcY518ydHc25eOmG1R2cQZiLEH5T3ybc e8ssePUWA3E7hXlV84SNK0Zj25DLk3Op0A7pNWvppIzpQmn76o+tdCLj1BsnKZMz+yov G+w+jbd19wF2VTvKfQkmBoicmKcrxBgjRKy7Eq5LBM4sKuYBFen3+GAUnXYIAS/94NDJ eHO11cgriEurn4ieb5tG4oWw6QAaqvHgcE5eKOA4sUsSdu+0lt0itgNo4/7o5zLbT41g IMlQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=yzardt2FwxVZJ2DLRLd08YoMu52l4Jh0UvvdOlCjplY=; b=SkzuNpTxsPhLV73PUp4Ksepct+C1ggVC/ucMMzCPmRfAidiY2CgB+LD83KAUl+hBvl nOr0VO3Pj95DENY91utsbZd6ATxTsgjTzVclB6229mNgfKS5zb7eJ9TlhIxlf1hwH55W QmOC+CSlVuiKEL7iqzPuYhsp4O9ctR1CFS+5e5Z89bNgrqj6JarmbT3nTBqA+RdLBe0e mRIVfPu7rwlhnmZnMolhkpqxdNP23hILInCQ5a+iKWpsAEjgrDWFeMloAMZCIBguqTQs zeLimsagv5+2J3qmKHHnlp6uV8lXncGYGW/M3TlppS/kkAsTqJsgfj5Gmf5Obm/t6BNa vkJQ== X-Gm-Message-State: AFqh2kpU7dgBewqUAnaBRdjqyz2KBOLeBZHgRVbVmFVbiAt7po+L9AI/ J6/afpIgXn5JEUTHqdGUf/vRx9DQ52yKcZigUdhXzQAE0cQ= X-Received: by 2002:a05:690c:e21:b0:4d0:f843:c376 with SMTP id cp33-20020a05690c0e2100b004d0f843c376mr1927433ywb.63.1673645997872; Fri, 13 Jan 2023 13:39:57 -0800 (PST) MIME-Version: 1.0 References: <20230110213010.2683185-1-avagin@google.com> <20230110213010.2683185-3-avagin@google.com> In-Reply-To: From: Andrei Vagin Date: Fri, 13 Jan 2023 13:39:46 -0800 Message-ID: Subject: Re: [PATCH 2/5] sched: add WF_CURRENT_CPU and externise ttwu To: Chen Yu Cc: Andrei Vagin , Peter Zijlstra , Ingo Molnar , Vincent Guittot , Dietmar Eggemann , linux-kernel@vger.kernel.org, Kees Cook , Christian Brauner , Andy Lutomirski , Juri Lelli , Peter Oskolkov , Tycho Andersen , Will Drewry Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jan 11, 2023 at 11:36 PM Chen Yu wrote: > > On 2023-01-10 at 13:30:07 -0800, Andrei Vagin wrote: > > From: Peter Oskolkov > > > > Add WF_CURRENT_CPU wake flag that advices the scheduler to > > move the wakee to the current CPU. This is useful for fast on-CPU > > context switching use cases. > > > > In addition, make ttwu external rather than static so that > > the flag could be passed to it from outside of sched/core.c. > > > > Signed-off-by: Peter Oskolkov > > Signed-off-by: Andrei Vagin > > @@ -7380,6 +7380,10 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int wake_flags) > > if (wake_flags & WF_TTWU) { > > record_wakee(p); > > > > + if ((wake_flags & WF_CURRENT_CPU) && > > + cpumask_test_cpu(cpu, p->cpus_ptr)) > > + return cpu; > I agree that cross-CPU wake up brings pain to fast context switching > use cases, especially on high core count system. We suffered from this > issue as well, so previously we presented this issue as well. The difference > is that we used some dynamic "WF_CURRENT_CPU" mechanism[1] to deal with it. > That is, if the waker/wakee are both short duration tasks, let the waker wakes up > the wakee on current CPU. So not only seccomp but also other components/workloads > could benefit from this without having to set the WF_CURRENT_CPU flag. > > Link [1]: > https://lore.kernel.org/lkml/cover.1671158588.git.yu.c.chen@intel.com/ Thanks for the link. I like the idea, but this change has no impact on the seccom notify case. I used the benchmark from the fifth patch. It is a ping-pong benchmark in which one process triggers system calls, and another process handles them. It measures the number of system calls that can be processed within a specified time slice. $ cd tools/testing/selftests/seccomp/ $ make $ ./seccomp_bpf 2>&1| grep user_notification_sync # RUN global.user_notification_sync ... # seccomp_bpf.c:4281:user_notification_sync:basic: 8489 nsec/syscall # seccomp_bpf.c:4281:user_notification_sync:sync: 3078 nsec/syscall # OK global.user_notification_sync ok 51 global.user_notification_sync The results are the same with and without your change. I expected that your change improves the basic case so that it reaches the sync one. I did some experiments and found that we can achieve the desirable outcome if we move the "short-task" checks prior to considering waking up on prev_cpu. For example, with this patch: diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 2f89e44e237d..af20b58e3972 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -6384,6 +6384,11 @@ static int wake_wide(struct task_struct *p) static int wake_affine_idle(int this_cpu, int prev_cpu, int sync) { + /* The only running task is a short duration one. */ + if (cpu_rq(this_cpu)->nr_running == 1 && + is_short_task(cpu_curr(this_cpu))) + return this_cpu; + /* * If this_cpu is idle, it implies the wakeup is from interrupt * context. Only allow the move if cache is shared. Otherwise an @@ -6405,11 +6410,6 @@ wake_affine_idle(int this_cpu, int prev_cpu, int sync) if (available_idle_cpu(prev_cpu)) return prev_cpu; - /* The only running task is a short duration one. */ - if (cpu_rq(this_cpu)->nr_running == 1 && - is_short_task(cpu_curr(this_cpu))) - return this_cpu; - return nr_cpumask_bits; } @@ -6897,6 +6897,10 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target) asym_fits_cpu(task_util, util_min, util_max, target)) return target; + if (!has_idle_core && cpu_rq(target)->nr_running == 1 && + is_short_task(cpu_curr(target)) && is_short_task(p)) + return target; + /* * If the previous CPU is cache affine and idle, don't be stupid: */ the basic test case shows almost the same results as the sync one: $ ./seccomp_bpf 2>&1| grep user_notification_sync # RUN global.user_notification_sync ... # seccomp_bpf.c:4281:user_notification_sync:basic: 3082 nsec/syscall # seccomp_bpf.c:4281:user_notification_sync:sync: 2690 nsec/syscall # OK global.user_notification_sync ok 51 global.user_notification_sync If you want to do any experiments, you can find my tree here: [1] https://github.com/avagin/linux-task-diag/tree/wip/seccom-notify-sync-and-shed-short-task Thanks, Andrei