Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751520AbdIMIYf (ORCPT ); Wed, 13 Sep 2017 04:24:35 -0400 Received: from mail-lf0-f50.google.com ([209.85.215.50]:34223 "EHLO mail-lf0-f50.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750776AbdIMIYa (ORCPT ); Wed, 13 Sep 2017 04:24:30 -0400 X-Google-Smtp-Source: AOwi7QAoVh4y9vIR13Arm3cpFphViZQwS2ZjDIjOjyjXWBmLFPIAR4AAxCXtHypmJvJiApR6Cg3XxGyBgbaYLgt3LOc= MIME-Version: 1.0 In-Reply-To: <95edafb1-5e9d-8461-db73-bcb002b7ebef@linux.vnet.ibm.com> References: <95edafb1-5e9d-8461-db73-bcb002b7ebef@linux.vnet.ibm.com> From: =?UTF-8?B?546L6YeR5rWm?= Date: Wed, 13 Sep 2017 10:24:28 +0200 Message-ID: Subject: Re: sysbench throughput degradation in 4.13+ To: Eric Farman Cc: Peter Zijlstra , Rik van Riel , LKML , Ingo Molnar , Christian Borntraeger , "KVM-ML (kvm@vger.kernel.org)" , vcaputo@pengaru.com Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2776 Lines: 69 2017-09-12 16:14 GMT+02:00 Eric Farman : > Hi Peter, Rik, > > Running sysbench measurements in a 16CPU/30GB KVM guest on a 20CPU/40GB > s390x host, we noticed a throughput degradation (anywhere between 13% and > 40%, depending on test) when moving the host from kernel 4.12 to 4.13. The > rest of the host and the entire guest remain unchanged; it is only the host > kernel that changes. Bisecting the host kernel blames commit 3fed382b46ba > ("sched/numa: Implement NUMA node level wake_affine()"). > > Reverting 3fed382b46ba and 815abf5af45f ("sched/fair: Remove > effective_load()") from a clean 4.13.0 build erases the throughput > degradation and returns us to what we see in 4.12.0. > > A little poking around points us to a fix/improvement to this, commit > 90001d67be2f ("sched/fair: Fix wake_affine() for !NUMA_BALANCING"), which > went in the 4.14 merge window and an unmerged fix [1] that corrects a small > error in that patch. Hopeful, since we were running !NUMA_BALANCING, I > applied these two patches to a clean 4.13.0 tree but continue to see the > performance degradation. Pulling current master or linux-next shows no > improvement lurking in the shadows. > > Running perf stat on the host during the guest sysbench run shows a > significant increase in cpu-migrations over the 4.12.0 run. Abbreviated > examples follow: > > # 4.12.0 > # perf stat -p 11473 -- sleep 5 > 62305.199305 task-clock (msec) # 12.458 CPUs > 368,607 context-switches > 4,084 cpu-migrations > 416 page-faults > > # 4.13.0 > # perf stat -p 11444 -- sleep 5 > 35892.653243 task-clock (msec) # 7.176 CPUs > 249,251 context-switches > 56,850 cpu-migrations > 804 page-faults > > # 4.13.0-revert-3fed382b46ba-and-815abf5af45f > # perf stat -p 11441 -- sleep 5 > 62321.767146 task-clock (msec) # 12.459 CPUs > 387,661 context-switches > 5,687 cpu-migrations > 1,652 page-faults > > # 4.13.0-apply-90001d67be2f > # perf stat -p 11438 -- sleep 5 > 48654.988291 task-clock (msec) # 9.729 CPUs > 363,150 context-switches > 43,778 cpu-migrations > 641 page-faults > > I'm not sure what doc to supply here and am unfamiliar with this code or its > recent changes, but I'd be happy to pull/try whatever is needed to help > debug things. Looking forward to hearing what I can do. > > Thanks, > Eric > > [1] https://lkml.org/lkml/2017/9/6/196 > +cc: vcaputo@pengaru.com He reported a performance degradation also on 4.13-rc7, it might be the same cause. Best, Jack