Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758954AbbGHPp4 (ORCPT ); Wed, 8 Jul 2015 11:45:56 -0400 Received: from mail.anarazel.de ([217.115.131.40]:38415 "EHLO mail.anarazel.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1758654AbbGHPpx (ORCPT ); Wed, 8 Jul 2015 11:45:53 -0400 Date: Wed, 8 Jul 2015 17:45:50 +0200 From: Andres Freund To: linux-kernel@vger.kernel.org, Mike Galbraith Cc: Ingo Molnar , Peter Zijlstra Subject: Significant performance difference for postgres w/o sched_autogroup Message-ID: <20150708154550.GH340@alap3.anarazel.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3967 Lines: 98 Hi, while debugging a performance issue I noticed the following very odd bit. Kernel: 4.1.0-rc7-andres-00049-ge64f638 CPU: 2xE5520 Workload: postgresql (multi-process via shared memory SQL server) and benchmark client (pgbench, multi threaded) running on the same server. Connected using unix sockets. The statements are relatively simple (~1.5ms on average), forcing frequent back/forth between server processes and pgbench threads. I found that disabling sched_autogroup *significantly* reduces throughput. Even when both server and client are started from the same terminal and thus should be in the same group! There's a significant difference in how %sys with autogroups enabled/disabled. ~8% v ~27%. That sounds too much. The difference in postgres performance is significant: enabled: progress: 1.0 s, 25857.2 tps, lat 1.171 ms stddev 3.314 progress: 2.0 s, 19582.5 tps, lat 1.586 ms stddev 5.349 progress: 3.0 s, 17796.3 tps, lat 1.804 ms stddev 7.364 progress: 4.0 s, 18736.7 tps, lat 1.696 ms stddev 7.612 progress: 5.0 s, 19269.9 tps, lat 1.640 ms stddev 6.835 progress: 6.0 s, 21389.6 tps, lat 1.467 ms stddev 5.663 progress: 7.0 s, 22153.8 tps, lat 1.417 ms stddev 8.417 progress: 8.0 s, 21550.7 tps, lat 1.487 ms stddev 7.460 progress: 9.0 s, 22602.3 tps, lat 1.401 ms stddev 6.738 progress: 10.0 s, 23475.5 tps, lat 1.327 ms stddev 7.111 tps = 21329.219141 (excluding connections establishing) progress: 1.0 s, 25092.5 tps, lat 1.255 ms stddev 0.511 progress: 2.0 s, 13322.0 tps, lat 2.377 ms stddev 5.435 progress: 3.0 s, 13903.0 tps, lat 2.298 ms stddev 5.075 progress: 4.0 s, 13734.0 tps, lat 2.328 ms stddev 5.146 progress: 5.0 s, 14015.0 tps, lat 2.281 ms stddev 5.099 progress: 6.0 s, 14034.0 tps, lat 2.277 ms stddev 5.002 progress: 7.0 s, 13930.0 tps, lat 2.297 ms stddev 5.136 progress: 8.0 s, 14015.0 tps, lat 2.279 ms stddev 5.161 progress: 9.0 s, 14000.0 tps, lat 2.287 ms stddev 5.117 progress: 10.0 s, 13900.6 tps, lat 2.299 ms stddev 5.142 tps = 15006.317841 (excluding connections establishing) (note the odd dip after the first second. That's what I was actually debugging) A perf diff of -g -a -e cycles:k shows: # Baseline Delta Shared Object Symbol # ........ ....... ................... .............................................. # 11.70% +3.87% [kernel.vmlinux] [k] unix_stream_recvmsg 6.74% +2.49% [kernel.vmlinux] [k] __fget 6.69% +4.31% [kernel.vmlinux] [k] sys_recvfrom 6.49% +1.77% [kernel.vmlinux] [k] system_call_after_swapgs 5.37% +1.39% [kernel.vmlinux] [k] _raw_spin_lock 4.63% +1.76% [kernel.vmlinux] [k] mutex_lock_interruptible 4.27% +1.27% [kernel.vmlinux] [k] mutex_unlock 3.43% +1.22% [kernel.vmlinux] [k] fput 3.41% -0.21% [kernel.vmlinux] [k] __fget_light 2.96% +1.33% libpthread-2.19.so [.] __libc_recv 2.56% +2.35% [kernel.vmlinux] [k] system_call_fastpath 2.36% +0.80% [kernel.vmlinux] [k] import_single_range 2.30% +0.82% [kernel.vmlinux] [k] sockfd_lookup_light Looking a bit into the profile shows an apparent difference in how often recv goes through the fastpath: enabled: - 6.55% pgbench [kernel.vmlinux] [k] sys_recvfrom - sys_recvfrom + 85.88% system_call_fastpath + 14.12% __libc_recv disabled: - 10.97% pgbench [kernel.vmlinux] [k] sys_recvfrom - sys_recvfrom + 74.16% system_call_fastpath + 25.84% __libc_recv Is that actually expected or is there a bug somewhere? The difference seems to be a bit large to me. If interesting, I can try to script the workload in a reproducible manner. Regards, Andres -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/