Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751463AbdILOOO (ORCPT ); Tue, 12 Sep 2017 10:14:14 -0400 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:49758 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751395AbdILOOL (ORCPT ); Tue, 12 Sep 2017 10:14:11 -0400 From: Eric Farman Subject: sysbench throughput degradation in 4.13+ To: Peter Zijlstra , Rik van Riel Cc: linux-kernel@vger.kernel.org, Ingo Molnar , Christian Borntraeger , kvm@vger.kernel.org Date: Tue, 12 Sep 2017 10:14:05 -0400 User-Agent: Mozilla/5.0 (X11; Linux i686; rv:52.0) Gecko/20100101 Thunderbird/52.2.1 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-TM-AS-GCONF: 00 x-cbid: 17091214-0020-0000-0000-00000CB2B827 X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00007711; HX=3.00000241; KW=3.00000007; PH=3.00000004; SC=3.00000227; SDB=6.00915968; UDB=6.00459918; IPR=6.00696199; BA=6.00005587; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00017126; XFM=3.00000015; UTC=2017-09-12 14:14:09 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17091214-0021-0000-0000-00005E19B64E Message-Id: <95edafb1-5e9d-8461-db73-bcb002b7ebef@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-09-12_04:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1707230000 definitions=main-1709120199 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2507 Lines: 61 Hi Peter, Rik, Running sysbench measurements in a 16CPU/30GB KVM guest on a 20CPU/40GB s390x host, we noticed a throughput degradation (anywhere between 13% and 40%, depending on test) when moving the host from kernel 4.12 to 4.13. The rest of the host and the entire guest remain unchanged; it is only the host kernel that changes. Bisecting the host kernel blames commit 3fed382b46ba ("sched/numa: Implement NUMA node level wake_affine()"). Reverting 3fed382b46ba and 815abf5af45f ("sched/fair: Remove effective_load()") from a clean 4.13.0 build erases the throughput degradation and returns us to what we see in 4.12.0. A little poking around points us to a fix/improvement to this, commit 90001d67be2f ("sched/fair: Fix wake_affine() for !NUMA_BALANCING"), which went in the 4.14 merge window and an unmerged fix [1] that corrects a small error in that patch. Hopeful, since we were running !NUMA_BALANCING, I applied these two patches to a clean 4.13.0 tree but continue to see the performance degradation. Pulling current master or linux-next shows no improvement lurking in the shadows. Running perf stat on the host during the guest sysbench run shows a significant increase in cpu-migrations over the 4.12.0 run. Abbreviated examples follow: # 4.12.0 # perf stat -p 11473 -- sleep 5 62305.199305 task-clock (msec) # 12.458 CPUs 368,607 context-switches 4,084 cpu-migrations 416 page-faults # 4.13.0 # perf stat -p 11444 -- sleep 5 35892.653243 task-clock (msec) # 7.176 CPUs 249,251 context-switches 56,850 cpu-migrations 804 page-faults # 4.13.0-revert-3fed382b46ba-and-815abf5af45f # perf stat -p 11441 -- sleep 5 62321.767146 task-clock (msec) # 12.459 CPUs 387,661 context-switches 5,687 cpu-migrations 1,652 page-faults # 4.13.0-apply-90001d67be2f # perf stat -p 11438 -- sleep 5 48654.988291 task-clock (msec) # 9.729 CPUs 363,150 context-switches 43,778 cpu-migrations 641 page-faults I'm not sure what doc to supply here and am unfamiliar with this code or its recent changes, but I'd be happy to pull/try whatever is needed to help debug things. Looking forward to hearing what I can do. Thanks, Eric [1] https://lkml.org/lkml/2017/9/6/196