Received: by 2002:a05:6602:2086:0:0:0:0 with SMTP id a6csp4466466ioa; Wed, 27 Apr 2022 04:37:24 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzICFAzHuzAxWYL3V2H4diEqdTMTmvDZJfk5uXUH3LXbJiOiFrKf/myEbBov17T5odzBZUv X-Received: by 2002:a17:90b:3a8a:b0:1d9:739f:96ba with SMTP id om10-20020a17090b3a8a00b001d9739f96bamr17560871pjb.117.1651059444137; Wed, 27 Apr 2022 04:37:24 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1651059444; cv=none; d=google.com; s=arc-20160816; b=XZooE5BW1wZsH0MqTD4t1tCigFfX/0zmmiU/94d8aAsYHxxBhGkFGLSUVsFhZYQUed NBHBpgCPkBo+Xt/cpndhywzTfgqACCXR7gWPqR5lxqZzB97746CfzBWY+y/XwERgD7jv XYcDogvrIgCduaGcMih0Iyu+J3cG4NXQuCsXbTG3Efz9dR3kjF56/hSVUZ09bbFh8I71 ScCT8+puYW5HbO9pKGG++Gl4qOI8kpQkxlN8oeahlH9ooSe9l35qnQSI4WURhnGzG7oh QdIqKrnkd5ovKuFR2NqmZjYduq3sRXzdwJB5aTIbQ4vs1MxfwDuo8e+SfPgFDPtEU530 oGSA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:user-agent:references:message-id :in-reply-to:subject:cc:to:from:date:dkim-signature; bh=7mElwCEiaQGyAHOSZ/um/NoSMlCDpzHCJIMdafSD+B0=; b=rU3RbVRQBJlAhtzyi05vmPmJsDnNZl5fvH/wiGb01oCkowfHJyi6DhahFB36T7zu3/ LzPgP8i907EjdO6Ui6+h4bIU7OkZP2guOv5pbSxkru82ijczyFCgnElxBhfqvK+t2v15 VraZFdsI1imWnHIbgfw3A6anVi3xL3u78HAjQSYn3rFoQWXK/dfRQYoW59aS5VnxNiJr QafvvUFs4bmWmgCMMEOot9Z3kOwhFsShNrOC9hj78Bz8E1iFqNjOIsIKGW1AmkPUOAMo gIp3fjB56nBxG698hkb0Xxrwqryd8UtzyNuWISJiV4B62ZG1JpVUk0vFwz/8uSZvsxaY 72yA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gentwo.de header.s=default header.b=lmXeCHTS; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gentwo.de Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [2620:137:e000::1:18]) by mx.google.com with ESMTPS id w15-20020a63f50f000000b003820502bd64si1254638pgh.231.2022.04.27.04.37.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 27 Apr 2022 04:37:24 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) client-ip=2620:137:e000::1:18; Authentication-Results: mx.google.com; dkim=pass header.i=@gentwo.de header.s=default header.b=lmXeCHTS; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gentwo.de Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id C47604A18E2; Wed, 27 Apr 2022 03:27:35 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229651AbiD0JWf (ORCPT + 99 others); Wed, 27 Apr 2022 05:22:35 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49356 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229441AbiD0JWd (ORCPT ); Wed, 27 Apr 2022 05:22:33 -0400 Received: from gentwo.de (gentwo.de [IPv6:2a02:c206:2048:5042::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E5A49284D4E; Wed, 27 Apr 2022 02:19:09 -0700 (PDT) Received: by gentwo.de (Postfix, from userid 1001) id 5A0C9B0034A; Wed, 27 Apr 2022 11:19:02 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=gentwo.de; s=default; t=1651051142; bh=THAW7SYz39sd/4SV8yYtY27oG4RBl9E93y+eOWpVpUk=; h=Date:From:To:cc:Subject:In-Reply-To:References:From; b=lmXeCHTS1JPWXVhPjcLNIuleVDaTvDTP4ykvzJJp3YixMG8lELuB+IIFRFMXUkxXk X4SfPdPrQ5oy8cP2ydHY+d+L1tM+R3iWFd42J59UEL46LmMD/BP9jZA2JaH20U+tRg j0f3l7VzVgoMFloIDTydL6sAELGKzvfXvd8x+XSaiSlBlK/tfz2Bb7HNVPw+2gH9+Z Xmy+z4yX6GG1nFKOkUcSPASZYdSX4kohjSU7bMDV5eBw/EkfOksRZrcAlvezOs2BwB pQbIbcxs5ZINHSERykb0UQbE0Nx/nnfWR1iMYxRDHqEtb0TA5+9HHhYCQEdeatriOc GMU1D4KmEyBLQ== Received: from localhost (localhost [127.0.0.1]) by gentwo.de (Postfix) with ESMTP id 515BFB0001E; Wed, 27 Apr 2022 11:19:02 +0200 (CEST) Date: Wed, 27 Apr 2022 11:19:02 +0200 (CEST) From: Christoph Lameter To: Marcelo Tosatti cc: linux-kernel@vger.kernel.org, Nitesh Lal , Nicolas Saenz Julienne , Frederic Weisbecker , Juri Lelli , Peter Zijlstra , Alex Belits , Peter Xu , Thomas Gleixner , Daniel Bristot de Oliveira , Oscar Shiang , linux-rdma@vger.kernel.org Subject: Re: [patch v12 00/13] extensible prctl task isolation interface and vmstat sync In-Reply-To: <20220315153132.717153751@fedora.localdomain> Message-ID: References: <20220315153132.717153751@fedora.localdomain> User-Agent: Alpine 2.22 (DEB 394 2020-01-19) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RDNS_NONE,SPF_HELO_NONE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Ok I actually have started an opensource project that may make use of the onshot interface. This is a bridging tool between two RDMA protocols called ib2roce. See https://gentwo.org/christoph/2022-bridging-rdma.pdf The relevant code can be found at https://github.com/clameter/rdma-core/tree/ib2roce/ib2roce. In particular look at the ib2roce.c source code. This is still under development. The ib2roce briding can run in a busy loop mode (-k option) where it spins on ibv_poll_cq() which is an RDMA call to handle incoming packets without kernel interaction. See busyloop() in ib2roce.c Currently I have configured the system to use CONFIG_NOHZ_FULL. With that I am able to reliably forward packets at a rate that saturates 100G Ethernet / EDR Infiniband from a single spinning thread. Without CONFIG_NOHZ_FULL any slight disturbance causes the forwarding to fall behind which will lead to dramatic packet loss since we are looking here at a potential data rate of 12.5Gbyte/sec and about 12.5Mbyte per msec. If the kernel interrupts the forwarding by say 10 msecs then we are falling behind by 125MB which would have to be buffered and processing by additional codes. That complexity makes it processing packets much slower which could cause the forwarding to slow down so that a recovery is not possible should the data continue to arrive at line rate. Isolation of the threads was done through the following kernel parameters: nohz_full=8-15,24-31 rcu_nocbs=8-15,24-31 poll_spectre_v2=off numa_balancing=disable rcutree.kthread_prio=3 intel_pstate=disable nosmt And systemd was configured with the following affinites: system.conf:CPUAffinity=0-7,16-23 This means that the second socket will be generally free of tasks and kernel threads. The NUMA configuration: $ numactl --hardware available: 2 nodes (0-1) node 0 cpus: 0 1 2 3 4 5 6 7 node 0 size: 94798 MB node 0 free: 92000 MB node 1 cpus: 8 9 10 11 12 13 14 15 node 1 size: 96765 MB node 1 free: 96082 MB node distances: node 0 1 0: 10 21 1: 21 10 I could modify busyloop() in ib2roce.c to use the oneshot mode via prctl provided by this patch instead of the NOHZ_FULL. What kind of metric could I be using to show the difference in idleness of the quality of the cpu isolation? The ib2roce tool already has a CLI mode where one can monitor the latencies that the busyloop experiences. See the latency calculations in busyloop() and the CLI command "core". Stats can be reset via the "zap" command. I can see the usefulness of the oneshot mode but (I am very very sorry) I still think that this patchset overdoes what is needed and I fail to understand what the point of inheritance, per syscall quiescint etc is. Those cause needless overhead in syscall handling and increase the complexity of managing a busyloop. Special handling when the scheduler switches a task? If tasks are being switched that requires them to be low latency and undisturbed then something went very very wrong with the system configuration and the only thing I would suggest is to issue some kernel warning that this is not the way one should configure the system.