MIME-Version: 1.0
In-Reply-To: <95edafb1-5e9d-8461-db73-bcb002b7ebef@linux.vnet.ibm.com>
References: <95edafb1-5e9d-8461-db73-bcb002b7ebef@linux.vnet.ibm.com>
From: =?UTF-8?B?546L6YeR5rWm?= <jinpuwang@gmail.com>
Date: Wed, 13 Sep 2017 10:24:28 +0200
Message-ID: <CAD9gYJJ9nSAbznEn80hfY3=+YjA8cKw6RztpgW6iDm7rQ0EsFg@mail.gmail.com>
Subject: Re: sysbench throughput degradation in 4.13+
To: Eric Farman <farman@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>, Rik van Riel <riel@redhat.com>,
        LKML <linux-kernel@vger.kernel.org>, Ingo Molnar <mingo@redhat.com>,
        Christian Borntraeger <borntraeger@de.ibm.com>,
        "KVM-ML (kvm@vger.kernel.org)" <kvm@vger.kernel.org>,
        vcaputo@pengaru.com
Content-Type: text/plain; charset="UTF-8"
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 2776
Lines: 69

2017-09-12 16:14 GMT+02:00 Eric Farman <farman@linux.vnet.ibm.com>:
> Hi Peter, Rik,
>
> Running sysbench measurements in a 16CPU/30GB KVM guest on a 20CPU/40GB
> s390x host, we noticed a throughput degradation (anywhere between 13% and
> 40%, depending on test) when moving the host from kernel 4.12 to 4.13.  The
> rest of the host and the entire guest remain unchanged; it is only the host
> kernel that changes.  Bisecting the host kernel blames commit 3fed382b46ba
> ("sched/numa: Implement NUMA node level wake_affine()").
>
> Reverting 3fed382b46ba and 815abf5af45f ("sched/fair: Remove
> effective_load()") from a clean 4.13.0 build erases the throughput
> degradation and returns us to what we see in 4.12.0.
>
> A little poking around points us to a fix/improvement to this, commit
> 90001d67be2f ("sched/fair: Fix wake_affine() for !NUMA_BALANCING"), which
> went in the 4.14 merge window and an unmerged fix [1] that corrects a small
> error in that patch.  Hopeful, since we were running !NUMA_BALANCING, I
> applied these two patches to a clean 4.13.0 tree but continue to see the
> performance degradation.  Pulling current master or linux-next shows no
> improvement lurking in the shadows.
>
> Running perf stat on the host during the guest sysbench run shows a
> significant increase in cpu-migrations over the 4.12.0 run.  Abbreviated
> examples follow:
>
> # 4.12.0
> # perf stat -p 11473 -- sleep 5
>       62305.199305      task-clock (msec)         #   12.458 CPUs
>            368,607      context-switches
>              4,084      cpu-migrations
>                416      page-faults
>
> # 4.13.0
> # perf stat -p 11444 -- sleep 5
>       35892.653243      task-clock (msec)         #    7.176 CPUs
>            249,251      context-switches
>             56,850      cpu-migrations
>                804      page-faults
>
> # 4.13.0-revert-3fed382b46ba-and-815abf5af45f
> # perf stat -p 11441 -- sleep 5
>       62321.767146      task-clock (msec)         #   12.459 CPUs
>            387,661      context-switches
>              5,687      cpu-migrations
>              1,652      page-faults
>
> # 4.13.0-apply-90001d67be2f
> # perf stat -p 11438 -- sleep 5
>       48654.988291      task-clock (msec)         #    9.729 CPUs
>            363,150      context-switches
>             43,778      cpu-migrations
>                641      page-faults
>
> I'm not sure what doc to supply here and am unfamiliar with this code or its
> recent changes, but I'd be happy to pull/try whatever is needed to help
> debug things.  Looking forward to hearing what I can do.
>
> Thanks,
> Eric
>
> [1] https://lkml.org/lkml/2017/9/6/196
>
+cc: vcaputo@pengaru.com
He reported a performance degradation also on 4.13-rc7, it might be
the same cause.

Best,
Jack