Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753399AbbLKORx (ORCPT ); Fri, 11 Dec 2015 09:17:53 -0500 Received: from mail-qk0-f178.google.com ([209.85.220.178]:33661 "EHLO mail-qk0-f178.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752292AbbLKORv (ORCPT ); Fri, 11 Dec 2015 09:17:51 -0500 MIME-Version: 1.0 Date: Fri, 11 Dec 2015 15:17:50 +0100 Message-ID: Subject: sched : performance regression 24% between 4.4rc4 and 4.3 kernel From: Jirka Hladky To: linux-kernel@vger.kernel.org Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4014 Lines: 84 Hello, we are doing performance testing of the new kernel scheduler (commit 53528695ff6d8b77011bc818407c13e30914a946). In most cases we see performance improvements compared to 4.3 kernel with the exception of stream benchmark when running on 4 NUMA node server. When we run 4 stream benchmark processes on 4 NUMA node server and we compare the total performance we see drop about 24% compared to 4.3 kernel. This is caused by the fact that 2 stream benchmarks are running on the same NUMA node while 1 NUMA node does not run any stream benchmark. With kernel 4.3, load is distributed evenly among all 4 NUMA nodes. When two stream benchmarks are running on the same NUMA node then the runtime is almost twice as long compared to one stream bench running on one NUMA node. See log files [1] bellow. Please see the graph comparing stream benchmark results between kernel 4.3 and 4.4rc4 (for legend see [2] bellow). https://jhladky.fedorapeople.org/sched_stream_kernel_4.3vs4.4rc4/Stream_benchmark_on_4_NUMA_node_server_4.3vs4.4rc4_kernel.png Could you please help us to identify the root cause of this regression? We don't have the skills to fix the problem ourselves but we will be more than happy to test any proposed patch for this issue. Thanks a lot for your help on that! Jirka Further details: [1] Log files can be downloaded here: https://jhladky.fedorapeople.org/sched_stream_kernel_4.3vs4.4rc4/4.4RC4_stream_log_files.tar.bz2 $grep "User time" *log stream.defaultRun.004streams.loop01.instance001.log:User time: 12.370 seconds stream.defaultRun.004streams.loop01.instance002.log:User time: 10.560 seconds stream.defaultRun.004streams.loop01.instance003.log:User time: 19.330 seconds stream.defaultRun.004streams.loop01.instance004.log:User time: 17.820 seconds $grep "NUMA nodes:" *log stream.defaultRun.004streams.loop01.instance001.log:NUMA nodes: 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 stream.defaultRun.004streams.loop01.instance002.log:NUMA nodes: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 stream.defaultRun.004streams.loop01.instance003.log:NUMA nodes: 3 3 3 3 3 3 3 3 3 0 0 0 0 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 0 0 0 0 0 0 0 0 0 0 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 stream.defaultRun.004streams.loop01.instance004.log:NUMA nodes: 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 0 0 0 0 0 0 0 0 0 0 0 0 => please note that NO bench is running on NUMA node #1 and instances #3 and #4 are running both on NUMA node #3. This has huge performance impact as stream instances on node #3 need 19 and 17 seconds to finish compared to 10 and 12 seconds for instances running alone on one NUMA node. [2] Graph: https://jhladky.fedorapeople.org/sched_stream_kernel_4.3vs4.4rc4/Stream_benchmark_on_4_NUMA_node_server_4.3vs4.4rc4_kernel.png Graph Legend: GREEN line => kernel 4.3 BLUE line => kernel 4.4rc4 x-axis => number of parallel stream instances y-axis => Sum [1/runtime] over all stream instances Details on server: DELL PowerEdge R820, 4x E5-4607 0 @ 2.20GHz and 128GB RAM http://ark.intel.com/products/64604 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/