Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp2271935imu; Sat, 10 Nov 2018 11:06:43 -0800 (PST) X-Google-Smtp-Source: AJdET5dnrcxg4djjBWcQb5yocPRz/dFXGKt2PGfA8UZfB7i8RMaD14CNeKePwPLJcHPivawZtug7 X-Received: by 2002:a17:902:6185:: with SMTP id u5-v6mr13581659plj.41.1541876803052; Sat, 10 Nov 2018 11:06:43 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1541876803; cv=none; d=google.com; s=arc-20160816; b=nUz9ZYSbQFVzAKe1pNjdMjL4vxB67me4Ne72Lefn33tJAyp+FAllenkVo0VEdofn/l LQdMp7HfJF4eU9WeL2/sywH9ApQqW8KF54JosdPndrlLRqDh70u/vk/TdjbNIwulRyBv 3RMsiRUouNAKp+wPD+eIq72LFa4HTcSa5qdmRo+iZy0Tphhv0lJ9Te0iPMBPX1JK7pDY pu920jXsTtujTnN/dPAq1vuplzttybRL+ITFwsVwrqLySdOmf4Vk1b1TRxmf9RfKTEvO cb/FNO0Yfp6Wv4HXEqDFWcAtE0089PP6SJsJl4Ue+aESgt0JrLPphlkSFZ/Jhe2MnxF1 IPoQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:date:cc:to:from:subject:message-id; bh=tte7hSlp3O3uC1X6LJ12clYWSsfRrJSIRU5ZbI4Z3QY=; b=fHDL5bW0kr96HRd/fjf8fExld8BTW3gagHRuVyg4I64oOHm/0nnfMiDBhPbLed/80K zp5RpHrdAIs0sOOGL9rtPruURrynOpawJX6CgnONSli1iPmSiz1OGmy7416SSRmbhCaJ 02yTWJQilsPeGCG1EE070Sa4N5ZAJJw3FSip26SnbOzVHEEHOIMbzoO4/NN++a6fjY/4 kVdJ10GYqPmF4jeme/0WG/RwWkxUq9umnKyuV0esfV0tpMvQa2gBYh8v8p9tx2HLS7vw KBnxIN22OgeJE7EOR7FO2qxLX2lrs+LywCdbn+02uB0AaH2+bZubNwaTROAM2GXNNWl5 oYIg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id w11si2082253pgp.161.2018.11.10.11.06.27; Sat, 10 Nov 2018 11:06:43 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727047AbeKKEwE (ORCPT + 99 others); Sat, 10 Nov 2018 23:52:04 -0500 Received: from mx2.suse.de ([195.135.220.15]:35744 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726323AbeKKEwD (ORCPT ); Sat, 10 Nov 2018 23:52:03 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay1.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 2027FAB64; Sat, 10 Nov 2018 19:05:58 +0000 (UTC) Message-ID: <1541877001.17878.5.camel@suse.cz> Subject: Re: [RFC/RFT][PATCH v5] cpuidle: New timer events oriented governor for tickless systems From: Giovanni Gherdovich To: "Rafael J. Wysocki" , Linux PM , Doug Smythies Cc: Srinivas Pandruvada , Peter Zijlstra , LKML , Frederic Weisbecker , Mel Gorman , Daniel Lezcano Date: Sat, 10 Nov 2018 20:10:01 +0100 In-Reply-To: <102783770.7hZjAahU8c@aspire.rjw.lan> References: <102783770.7hZjAahU8c@aspire.rjw.lan> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.22.6 Mime-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 2018-11-08 at 18:25 +0100, Rafael J. Wysocki wrote: > From: Rafael J. Wysocki > Subject: [PATCH] cpuidle: New timer events oriented governor for tickless systems > > The venerable menu governor does some thigns that are quite > questionable in my view. > > First, it includes timer wakeups in the pattern detection data and > mixes them up with wakeups from other sources which in some cases > causes it to expect what essentially would be a timer wakeup in a > time frame in which no timer wakeups are possible (becuase it knows > the time until the next timer event and that is later than the > expected wakeup time). > > Second, it uses the extra exit latency limit based on the predicted > idle duration and depending on the number of tasks waiting on I/O, > even though those tasks may run on a different CPU when they are > woken up.  Moreover, the time ranges used by it for the sleep length > correction factors depend on whether or not there are tasks waiting > on I/O, which again doesn't imply anything in particular, and they > are not correlated to the list of available idle states in any way > whatever. > > Also, the pattern detection code in menu may end up considering > values that are too large to matter at all, in which cases running > it is a waste of time. > > A major rework of the menu governor would be required to address > these issues and the performance of at least some workloads (tuned > specifically to the current behavior of the menu governor) is likely > to suffer from that.  It is thus better to introduce an entirely new > governor without them and let everybody use the governor that works > better with their actual workloads. > > The new governor introduced here, the timer events oriented (TEO) > governor, uses the same basic strategy as menu: it always tries to > find the deepest idle state that can be used in the given conditions. > However, it applies a different approach to that problem. > > First, it doesn't use "correction factors" for the time till the > closest timer, but instead it tries to correlate the measured idle > duration values with the available idle states and use that > information to pick up the idle state that is most likely to "match" > the upcoming CPU idle interval. > > Second, it doesn't take the number of "I/O waiters" into account at > all and the pattern detection code in it avoids taking timer wakeups > into account.  It also only uses idle duration values less than the > current time till the closest timer (with the tick excluded) for that > purpose. > > Signed-off-by: Rafael J. Wysocki > --- > > v4 -> v5: >  * Avoid using shallow idle states when the tick has been stopped already. > > v3 -> v4: >  * Make the pattern detection avoid returning too early if the minimum >    sample is too far from the average. >  * Reformat the changelog (as requested by Peter). > > v2 -> v3: >  * Simplify the pattern detection code and make it return a value > lower than the time to the closest timer if the majority of recent > idle intervals are below it regardless of their variance (that should > cause it to be slightly more aggressive). >  * Do not count wakeups from state 0 due to the time limit in poll_idle() >    as non-timer. > > [...] [NOTE: the tables in this message are quite wide, ~130 columns. If this doesn't get to you properly formatted you can read a copy of this message at the URL https://beta.suse.com/private/ggherdovich/teo-eval/teo-eval.html ] Hello Rafael, I have results for v3 and v5. Regarding v4, I made a mistake and didn't get valid data; as I saw v5 coming shortly after, I didn't rerun v4. I'm replying to the v5 thread because that's where these results belong, but I'm quoting your text from the v2 email at https://lore.kernel.org/lkml/4168371.zz0pVZtGOY@aspire.rjw.lan so that's easier to follow along. The quick summary is: ---> sockperf on loopback over UDP, mode "throughput":      this had a 12% regression in v2 on 48x-HASWELL-NUMA, which is completely      recovered in v3 and v5. Good stuff. ---> dbench on xfs:      this was down 16% in v2 on 48x-HASWELL-NUMA. On v5 we're at a 10%      regression. Slight improvement. What's really hurting here is the single      client scenario. ---> netperf-udp on loopback:      had 6% regression on v2 on 8x-SKYLAKE-UMA, which is the same as what      happens in v5. ---> tbench on loopback:      was down 10% in v2 on 8x-SKYLAKE-UMA, now slightly worse in v5 with a 12%      regression. As in dbench, it's at low number of clients that the results      are worst. Note that this machine is different from the one that has the      dbench regression. A more detailed report follows below. I maintain my original opinion from v2 that this governor is largely performance-neutral and I'm not overly worried about the numbers above: * results change from machine to machine: dbench is down 10% on   48x-HASWELL-NUMA, but it also gives you the largest win on the board with a   4% improvement on 8x-SKYLAKE-UMA. All regressions I mention only manifest on   one out of three machines. * similar benchmarks give contradicting results: dbench seems highly sensitive   to this patch, but pgbench, sqlite, and fio are not. netperf-udp is slightly   down on 48x-HASWELL-NUMA but sockperf-udp-throughput has benefited from v5   on that same machine. To raise an alert from the performance angle I have to see red on my board from an entire category of benchmarks (ie I/O, or networking, or scheduler-intensive, etc) and on a variety of hardware configurations. That's not happening here. On Sun, 2018-11-04 at 11:06 +0100, Rafael J. Wysocki wrote: > On Wednesday, October 31, 2018 7:36:21 PM CET Giovanni Gherdovich wrote: > > > > [...] > > I've tested your patches applying them on v4.18 (plus the backport > > necessary for v2 as Doug helpfully noted), just because it was the latest > > release when I started preparing this. I did the same for v3 and v5: baseline is v4.18, using that backport from linux-next. > >  > > I've tested it on three machines, with different generations of Intel CPUs: > >  > > * single socket E3-1240 v5 (Skylake 8 cores, which I'll call 8x-SKYLAKE-UMA) > > * two sockets E5-2698 v4 (Broadwell 80 cores, 80x-BROADWELL-NUMA from here onwards) > > * two sockets E5-2670 v3 (Haswell 48 cores, 48x-HASWELL-NUMA from here onwards) > > Same machines. > >  > > BENCHMARKS WITH NEUTRAL RESULTS > > =============================== > >  > > These are the workloads where no noticeable difference is measured (on both > > v1 and v2, all machines), together with the corresponding MMTests[1] > > configuration file name: > >  > > * pgbench read-only on xfs, pgbench read/write on xfs > >     * global-dhp__db-pgbench-timed-ro-small-xfs > >     * global-dhp__db-pgbench-timed-rw-small-xfs > > * siege > >     * global-dhp__http-siege > > * hackbench, pipetest > >     * global-dhp__scheduler-unbound > > * Linux kernel compilation > >     * global-dhp__workload_kerndevel-xfs > > * NASA Parallel Benchmarks, C-Class (linear algebra; run both with OpenMP > >   and OpenMPI, over xfs) > >     * global-dhp__nas-c-class-mpi-full-xfs > >     * global-dhp__nas-c-class-omp-full > > * FIO (Flexible IO) in several configurations > >     * global-dhp__io-fio-randread-async-randwrite-xfs > >     * global-dhp__io-fio-randread-async-seqwrite-xfs > >     * global-dhp__io-fio-seqread-doublemem-32k-4t-xfs > >     * global-dhp__io-fio-seqread-doublemem-4k-4t-xfs > > * netperf on loopback over TCP > >     * global-dhp__network-netperf-unbound >  > The above is great to know. All of the above are confirmed, plus we can add to the group of neutral benchmarks: * xfsrepair     * global-dhp__io-xfsrepair-xfs * sqlite (insert operations on xfs)     * global-dhp__db-sqlite-insert-medium-xfs * schbench     * global-dhp__workload_schbench * gitsource on xfs (git unit tests, shell intensive)     * global-dhp__workload_shellscripts-xfs >  > > BENCHMARKS WITH NON-NEUTRAL RESULTS: OVERVIEW > > ============================================= > >  > > These are benchmarks which exhibit a variation in their performance; > > you'll see the magnitude of the changes is moderate and it's highly variable > > from machine to machine. All percentages refer to the v4.18 baseline. In > > more than one case the Haswell machine seems to prefer v1 to v2. > > > > [...] > > > > * netperf on loopback over UDP > >     * global-dhp__network-netperf-unbound > >  > >                                     teo-v1          teo-v2 > >             ------------------------------------------------- > >             8x-SKYLAKE-UMA          no change       6% worse > >             80x-BROADWELL-NUMA      1% worse        4% worse > >             48x-HASWELL-NUMA        3% better       5% worse > > New data for netperf-udp, as 8x-SKYLAKE-UMA looked slightly off: * netperf on loopback over UDP     * global-dhp__network-netperf-unbound                         teo-v1          teo-v2          teo-v3          teo-v5   ------------------------------------------------------------------------------   8x-SKYLAKE-UMA        no change       6% worse        4% worse        6% worse   80x-BROADWELL-NUMA    1% worse        4% worse        no change       no change   48x-HASWELL-NUMA      3% better       5% worse        7% worse        5% worse > > [...] > > > > * sockperf on loopback over UDP, mode "throughput" > >     * global-dhp__network-sockperf-unbound >  > Generally speaking, I'm not worried about single-digit percent differences, > because overall they tend to fall into the noise range in the grand picture. >  > >                                     teo-v1          teo-v2 > >             ------------------------------------------------- > >             8x-SKYLAKE-UMA          1% worse        1% worse > >             80x-BROADWELL-NUMA      3% better       2% better > >             48x-HASWELL-NUMA        4% better       12% worse >  > But the 12% difference here is slightly worrisome. Following up on the above: * sockperf on loopback over UDP, mode "throughput"     * global-dhp__network-sockperf-unbound                         teo-v1          teo-v2          teo-v3          teo-v5   -------------------------------------------------------------------------------   8x-SKYLAKE-UMA        1% worse        1% worse        1% worse        1% worse   80x-BROADWELL-NUMA    3% better       2% better       5% better       3% worse   48x-HASWELL-NUMA      4% better       12% worse       no change       no change > > > > [...] > >  > > * dbench on xfs > >         * global-dhp__io-dbench4-async-xfs > >  > >                                     teo-v1          teo-v2 > >             ------------------------------------------------- > >             8x-SKYLAKE-UMA          3% better       4% better > >             80x-BROADWELL-NUMA      no change       no change > >             48x-HASWELL-NUMA        6% worse        16% worse >  > And same here. With new data: * dbench on xfs     * global-dhp__io-dbench4-async-xfs                         teo-v1          teo-v2          teo-v3          teo-v5      -------------------------------------------------------------------------------   8x-SKYLAKE-UMA        3% better       4% better       6% better       4% better   80x-BROADWELL-NUMA    no change       no change       1% worse        3% worse   48x-HASWELL-NUMA      6% worse        16% worse       8% worse        10% worse  >  > > * tbench on loopback > >     * global-dhp__network-tbench > >  > >                                     teo-v1          teo-v2 > >             ------------------------------------------------- > >             8x-SKYLAKE-UMA          1% worse        10% worse > >             80x-BROADWELL-NUMA      1% worse        1% worse > >             48x-HASWELL-NUMA        1% worse        2% worse > > Update on tbench: * tbench on loopback     * global-dhp__network-tbench                         teo-v1          teo-v2          teo-v3          teo-v5   ------------------------------------------------------------------------------   8x-SKYLAKE-UMA        1% worse        10% worse       11% worse       12% worse   80x-BROADWELL-NUMA    1% worse        1% worse        no cahnge       1% worse   48x-HASWELL-NUMA      1% worse        2% worse        1% worse        1% worse > > [...] > >  > > BENCHMARKS WITH NON-NEUTRAL RESULTS: DETAIL > > =========================================== > >  > > Now some more detail. Each benchmark is run in a variety of configurations > > (eg. number of threads, number of concurrent connections and so forth) each > > of them giving a result. What you see above is the geometric mean of > > "sub-results"; below is the detailed view where there was a regression > > larger than 5% (either in v1 or v2, on any of the machines). That means > > I'll exclude xfsrepar, sqlite, schbench and the git unit tests "gitsource" > > that have negligible swings from the baseline. > >  > > In all tables asterisks indicate a statement about statistical > > significance: the difference with baseline has a p-value smaller than 0.1 > > (small p-values indicate that the difference is real and not just random > > noise). > >  > > NETPERF-UDP > > =========== > > NOTES: Test run in mode "stream" over UDP. The varying parameter is the > >     message size in bytes. Each measurement is taken 5 times and the > >     harmonic mean is reported. > > MEASURES: Throughput in MBits/second, both on the sender and on the receiver end. > > HIGHER is better > >  > > machine: 8x-SKYLAKE-UMA > >                                      4.18.0                 4.18.0                 4.18.0 > >                                     vanilla                 teo-v1        teo-v2+backport > > ----------------------------------------------------------------------------------------- > > Hmean     send-64         362.27 (   0.00%)      362.87 (   0.16%)      318.85 * -11.99%* > > Hmean     send-128        723.17 (   0.00%)      723.66 (   0.07%)      660.96 *  -8.60%* > > Hmean     send-256       1435.24 (   0.00%)     1427.08 (  -0.57%)     1346.22 *  -6.20%* > > Hmean     send-1024      5563.78 (   0.00%)     5529.90 *  -0.61%*     5228.28 *  -6.03%* > > Hmean     send-2048     10935.42 (   0.00%)    10809.66 *  -1.15%*    10521.14 *  -3.79%* > > Hmean     send-3312     16898.66 (   0.00%)    16539.89 *  -2.12%*    16240.87 *  -3.89%* > > Hmean     send-4096     19354.33 (   0.00%)    19185.43 (  -0.87%)    18600.52 *  -3.89%* > > Hmean     send-8192     32238.80 (   0.00%)    32275.57 (   0.11%)    29850.62 *  -7.41%* > > Hmean     send-16384    48146.75 (   0.00%)    49297.23 *   2.39%*    48295.51 (   0.31%) > > Hmean     recv-64         362.16 (   0.00%)      362.87 (   0.19%)      318.82 * -11.97%* > > Hmean     recv-128        723.01 (   0.00%)      723.66 (   0.09%)      660.89 *  -8.59%* > > Hmean     recv-256       1435.06 (   0.00%)     1426.94 (  -0.57%)     1346.07 *  -6.20%* > > Hmean     recv-1024      5562.68 (   0.00%)     5529.90 *  -0.59%*     5228.28 *  -6.01%* > > Hmean     recv-2048     10934.36 (   0.00%)    10809.66 *  -1.14%*    10519.89 *  -3.79%* > > Hmean     recv-3312     16898.65 (   0.00%)    16538.21 *  -2.13%*    16240.86 *  -3.89%* > > Hmean     recv-4096     19351.99 (   0.00%)    19183.17 (  -0.87%)    18598.33 *  -3.89%* > > Hmean     recv-8192     32238.74 (   0.00%)    32275.13 (   0.11%)    29850.39 *  -7.41%* > > Hmean     recv-16384    48146.59 (   0.00%)    49296.23 *   2.39%*    48295.03 (   0.31%) >  > That is a bit worse than I would like it to be TBH. update on netperf-udp: machine: 8x-SKYLAKE-UMA                                      4.18.0                 4.18.0                 4.18.0                 4.18.0                 4.18.0                                     vanilla                    teo        teo-v2+backport        teo-v3+backport        teo-v5+backport --------------------------------------------------------------------------------------------------------------------------------------- Hmean     send-64         362.27 (   0.00%)      362.87 (   0.16%)      318.85 * -11.99%*      347.08 *  -4.19%*      333.48 *  -7.95%* Hmean     send-128        723.17 (   0.00%)      723.66 (   0.07%)      660.96 *  -8.60%*      676.46 *  -6.46%*      650.71 * -10.02%* Hmean     send-256       1435.24 (   0.00%)     1427.08 (  -0.57%)     1346.22 *  -6.20%*     1359.59 *  -5.27%*     1323.83 *  -7.76%* Hmean     send-1024      5563.78 (   0.00%)     5529.90 *  -0.61%*     5228.28 *  -6.03%*     5382.04 *  -3.27%*     5271.99 *  -5.24%* Hmean     send-2048     10935.42 (   0.00%)    10809.66 *  -1.15%*    10521.14 *  -3.79%*    10610.29 *  -2.97%*    10544.58 *  -3.57%* Hmean     send-3312     16898.66 (   0.00%)    16539.89 *  -2.12%*    16240.87 *  -3.89%*    16271.23 *  -3.71%*    15968.89 *  -5.50%* Hmean     send-4096     19354.33 (   0.00%)    19185.43 (  -0.87%)    18600.52 *  -3.89%*    18692.16 *  -3.42%*    18408.69 *  -4.89%* Hmean     send-8192     32238.80 (   0.00%)    32275.57 (   0.11%)    29850.62 *  -7.41%*    30066.83 *  -6.74%*    29824.62 *  -7.49%* Hmean     send-16384    48146.75 (   0.00%)    49297.23 *   2.39%*    48295.51 (   0.31%)    48800.37 *   1.36%*    48247.73 (   0.21%) Hmean     recv-64         362.16 (   0.00%)      362.87 (   0.19%)      318.82 * -11.97%*      347.07 *  -4.17%*      333.48 *  -7.92%* Hmean     recv-128        723.01 (   0.00%)      723.66 (   0.09%)      660.89 *  -8.59%*      676.39 *  -6.45%*      650.63 * -10.01%* Hmean     recv-256       1435.06 (   0.00%)     1426.94 (  -0.57%)     1346.07 *  -6.20%*     1359.45 *  -5.27%*     1323.81 *  -7.75%* Hmean     recv-1024      5562.68 (   0.00%)     5529.90 *  -0.59%*     5228.28 *  -6.01%*     5381.37 *  -3.26%*     5271.45 *  -5.24%* Hmean     recv-2048     10934.36 (   0.00%)    10809.66 *  -1.14%*    10519.89 *  -3.79%*    10610.28 *  -2.96%*    10544.58 *  -3.56%* Hmean     recv-3312     16898.65 (   0.00%)    16538.21 *  -2.13%*    16240.86 *  -3.89%*    16269.34 *  -3.72%*    15967.13 *  -5.51%* Hmean     recv-4096     19351.99 (   0.00%)    19183.17 (  -0.87%)    18598.33 *  -3.89%*    18690.13 *  -3.42%*    18407.45 *  -4.88%* Hmean     recv-8192     32238.74 (   0.00%)    32275.13 (   0.11%)    29850.39 *  -7.41%*    30062.78 *  -6.75%*    29824.30 *  -7.49%* Hmean     recv-16384    48146.59 (   0.00%)    49296.23 *   2.39%*    48295.03 (   0.31%)    48786.88 *   1.33%*    48246.71 (   0.21%) Here is a plot of the raw benchmark data, you can better see the distribution and variability of the results: https://beta.suse.com/private/ggherdovich/teo-eval/teo-eval.html#netperf-udp > > [...] > > > > SOCKPERF-UDP-THROUGHPUT > > ======================= > > NOTES: Test run in mode "throughput" over UDP. The varying parameter is the > >     message size. > > MEASURES: Throughput, in MBits/second > > HIGHER is better > >  > > machine: 48x-HASWELL-NUMA > >                               4.18.0                 4.18.0                 4.18.0 > >                              vanilla                 teo-v1        teo-v2+backport > > ---------------------------------------------------------------------------------- > > Hmean     14        48.16 (   0.00%)       50.94 *   5.77%*       42.50 * -11.77%* > > Hmean     100      346.77 (   0.00%)      358.74 *   3.45%*      303.31 * -12.53%* > > Hmean     300     1018.06 (   0.00%)     1053.75 *   3.51%*      895.55 * -12.03%* > > Hmean     500     1693.07 (   0.00%)     1754.62 *   3.64%*     1489.61 * -12.02%* > > Hmean     850     2853.04 (   0.00%)     2948.73 *   3.35%*     2473.50 * -13.30%* >  > Well, in this case the consistent improvement in v1 turned into a consistent decline > in the v2, and over 10% for that matter.  Needs improvement IMO. Update: this one got resolved in v5, machine: 48x-HASWELL-NUMA                               4.18.0                 4.18.0                 4.18.0                 4.18.0                 4.18.0                              vanilla                    teo        teo-v2+backport        teo-v3+backport        teo-v5+backport -------------------------------------------------------------------------------------------------------------------------------- Hmean     14        48.16 (   0.00%)       50.94 *   5.77%*       42.50 * -11.77%*       48.91 *   1.55%*       49.06 *   1.87%* Hmean     100      346.77 (   0.00%)      358.74 *   3.45%*      303.31 * -12.53%*      350.75 (   1.15%)      347.52 (   0.22%) Hmean     300     1018.06 (   0.00%)     1053.75 *   3.51%*      895.55 * -12.03%*     1014.00 (  -0.40%)     1023.99 (   0.58%) Hmean     500     1693.07 (   0.00%)     1754.62 *   3.64%*     1489.61 * -12.02%*     1688.50 (  -0.27%)     1698.43 (   0.32%) Hmean     850     2853.04 (   0.00%)     2948.73 *   3.35%*     2473.50 * -13.30%*     2836.13 (  -0.59%)     2767.66 *  -2.99%* plots of raw data at https://beta.suse.com/private/ggherdovich/teo-eval/teo-eval.html#sockperf-udp-throughput >  > > DBENCH4 > > ======= > > NOTES: asyncronous IO; varies the number of clients up to NUMCPUS*8. > > MEASURES: latency (millisecs) > > LOWER is better > >  > > machine: 48x-HASWELL-NUMA > >                               4.18.0                 4.18.0                 4.18.0 > >                              vanilla                 teo-v1        teo-v2+backport > > ---------------------------------------------------------------------------------- > > Amean      1        37.15 (   0.00%)       50.10 ( -34.86%)       39.02 (  -5.03%) > > Amean      2        43.75 (   0.00%)       45.50 (  -4.01%)       44.36 (  -1.39%) > > Amean      4        54.42 (   0.00%)       58.85 (  -8.15%)       58.17 (  -6.89%) > > Amean      8        75.72 (   0.00%)       74.25 (   1.94%)       82.76 (  -9.30%) > > Amean      16      116.56 (   0.00%)      119.88 (  -2.85%)      164.14 ( -40.82%) > > Amean      32      570.02 (   0.00%)      561.92 (   1.42%)      681.94 ( -19.63%) > > Amean      64     3185.20 (   0.00%)     3291.80 (  -3.35%)     4337.43 ( -36.17%) >  > This one too. Update: machine: 48x-HASWELL-NUMA                               4.18.0                 4.18.0                 4.18.0                 4.18.0                 4.18.0                              vanilla                    teo        teo-v2+backport        teo-v3+backport        teo-v5+backport -------------------------------------------------------------------------------------------------------------------------------- Amean      1        37.15 (   0.00%)       50.10 ( -34.86%)       39.02 (  -5.03%)       52.24 ( -40.63%)       51.62 ( -38.96%) Amean      2        43.75 (   0.00%)       45.50 (  -4.01%)       44.36 (  -1.39%)       47.25 (  -8.00%)       44.20 (  -1.03%) Amean      4        54.42 (   0.00%)       58.85 (  -8.15%)       58.17 (  -6.89%)       55.12 (  -1.29%)       58.07 (  -6.70%) Amean      8        75.72 (   0.00%)       74.25 (   1.94%)       82.76 (  -9.30%)       78.63 (  -3.84%)       85.33 ( -12.68%) Amean      16      116.56 (   0.00%)      119.88 (  -2.85%)      164.14 ( -40.82%)      124.87 (  -7.13%)      124.54 (  -6.85%) Amean      32      570.02 (   0.00%)      561.92 (   1.42%)      681.94 ( -19.63%)      568.93 (   0.19%)      571.23 (  -0.21%) Amean      64     3185.20 (   0.00%)     3291.80 (  -3.35%)     4337.43 ( -36.17%)     3181.13 (   0.13%)     3382.48 (  -6.19%) It doesn't do well in the single-client scenario; v2 was a lot better at that, but on the other side it suffered at saturation (64 clients on a 48 cores box). Plot at https://beta.suse.com/private/ggherdovich/teo-eval/teo-eval.html#dbench4 >  > > TBENCH4 > > ======= > > NOTES: networking counterpart of dbench. Varies the number of clients up to NUMCPUS*4 > > MEASURES: Throughput, MB/sec > > HIGHER is better > >  > > machine: 8x-SKYLAKE-UMA > >                                     4.18.0                 4.18.0                 4.18.0 > >                                    vanilla                    teo        teo-v2+backport > > ---------------------------------------------------------------------------------------- > > Hmean     mb/sec-1       620.52 (   0.00%)      613.98 *  -1.05%*      502.47 * -19.03%* > > Hmean     mb/sec-2      1179.05 (   0.00%)     1112.84 *  -5.62%*      820.57 * -30.40%* > > Hmean     mb/sec-4      2072.29 (   0.00%)     2040.55 *  -1.53%*     2036.11 *  -1.75%* > > Hmean     mb/sec-8      4238.96 (   0.00%)     4205.01 *  -0.80%*     4124.59 *  -2.70%* > > Hmean     mb/sec-16     3515.96 (   0.00%)     3536.23 *   0.58%*     3500.02 *  -0.45%* > > Hmean     mb/sec-32     3452.92 (   0.00%)     3448.94 *  -0.12%*     3428.08 *  -0.72%* > >  >  > And same here. New data: machine: 8x-SKYLAKE-UMA                                     4.18.0                 4.18.0                 4.18.0                 4.18.0                 4.18.0                                    vanilla                    teo        teo-v2+backport        teo-v3+backport        teo-v5+backport -------------------------------------------------------------------------------------------------------------------------------------- Hmean     mb/sec-1       620.52 (   0.00%)      613.98 *  -1.05%*      502.47 * -19.03%*      492.77 * -20.59%*      464.52 * -25.14%* Hmean     mb/sec-2      1179.05 (   0.00%)     1112.84 *  -5.62%*      820.57 * -30.40%*      831.23 * -29.50%*      780.97 * -33.76%* Hmean     mb/sec-4      2072.29 (   0.00%)     2040.55 *  -1.53%*     2036.11 *  -1.75%*     2016.97 *  -2.67%*     2019.79 *  -2.53%* Hmean     mb/sec-8      4238.96 (   0.00%)     4205.01 *  -0.80%*     4124.59 *  -2.70%*     4098.06 *  -3.32%*     4171.64 *  -1.59%* Hmean     mb/sec-16     3515.96 (   0.00%)     3536.23 *   0.58%*     3500.02 *  -0.45%*     3438.60 *  -2.20%*     3456.89 *  -1.68%* Hmean     mb/sec-32     3452.92 (   0.00%)     3448.94 *  -0.12%*     3428.08 *  -0.72%*     3369.30 *  -2.42%*     3430.09 *  -0.66%* Again, the pain point is at low client count; v1 OTOH was neutral. Plot at https://beta.suse.com/private/ggherdovich/teo-eval/teo-eval.html#tbench4 Cheers, Giovanni