Received: by 10.223.176.46 with SMTP id f43csp2915342wra; Mon, 22 Jan 2018 05:38:42 -0800 (PST) X-Google-Smtp-Source: AH8x226JIr1R3UrnlIKDxw7aaWb9Sb2ZgjJc5OuoKBdCXKFNJsxB2au29Q89ZuiAykEIEkvKFQu2 X-Received: by 10.99.5.69 with SMTP id 66mr7211339pgf.80.1516628321969; Mon, 22 Jan 2018 05:38:41 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1516628321; cv=none; d=google.com; s=arc-20160816; b=OZqjuBtUP7B8+M182JWdOf3of+HLFZodyDdywY7Rz7OzDZ3zpb6mFXVkq1bbSCWZ08 myJYTYtT+uR3RngkVpqp9F7i+OLwsuTXGp+N7mizUgpZr9rTMqWYxsmVMEalftttabJ0 3kbqoUt9TpYh+ZXkB+7tOe9EYJybUh100XalS7/zGnXUi4vBpwyxkSGRUUa0KsCukvpz YBT6y+xpNU+3w1egogsJX9qDEnoNjKfnw52rHI1p+DprzUb0pfj54yjP4g/VturYl2wE UuC9t2iW423dPeS99sRsCMJSxqD5GUaxkCNUFyB1u3Yj9Na6Ke3xwUVFxHGZawjE8Lph SzYQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:date:cc:to:from:subject:message-id :arc-authentication-results; bh=ejeDUQ9Wi2Ev9mf9dhDhRvl6OajZGAdhdad23ZbgLxE=; b=sDcfw3iovslzHyZCzB1yN7jDh8nFsZujHgxkqVvkEQKLGYI3sgaYISuTtDBKU6ULH5 c1D5R+Nlc68zlLWnDD4ecg/t5YjSR2cHFsHeedgEzUFue4/Aj7tfMFbVscJCx8hStWrK d9Cm2UEkoCW1NI1mMgLp4bkPY4QRQ4ad/d0V3dma/mvAkSUwi/W0A7qglOGomLwYR0/d 3nkpqTE6MLgeNTWak0HXDUPAZDAKy4S1MtZQx+00Q+hIfoNb6vdWjIAIANagipcEnZT7 33XjD+nAOvr2qsS6VUIcfqOpnGkGsFO5UfoZdS/CaiNR6GWijmJLsPqtRb0kTHVhbj1E jf1w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id m20-v6si3494220pli.194.2018.01.22.05.38.27; Mon, 22 Jan 2018 05:38:41 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751124AbeAVNiG convert rfc822-to-8bit (ORCPT + 99 others); Mon, 22 Jan 2018 08:38:06 -0500 Received: from mout.gmx.net ([212.227.17.21]:63688 "EHLO mout.gmx.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750955AbeAVNiE (ORCPT ); Mon, 22 Jan 2018 08:38:04 -0500 Received: from homer.simpson.net ([185.191.217.115]) by mail.gmx.com (mrgmx101 [212.227.17.168]) with ESMTPSA (Nemesis) id 0LcWSI-1f3Ifh3dPK-00jtFf; Mon, 22 Jan 2018 14:37:36 +0100 Message-ID: <1516628254.7500.19.camel@gmx.de> Subject: Re: unixbench context switch perfomance & cpu topology From: Mike Galbraith To: Wanpeng Li Cc: linux-kernel@vger.kernel.org, kvm , Paolo Bonzini , Peter Zijlstra , Radim Krcmar , Frederic Weisbecker , Thomas Gleixner , Ingo Molnar Date: Mon, 22 Jan 2018 14:37:34 +0100 In-Reply-To: References: <1516622939.24679.5.camel@gmx.de> Content-Type: text/plain; charset="ISO-8859-15" X-Mailer: Evolution 3.20.5 Mime-Version: 1.0 Content-Transfer-Encoding: 8BIT X-Provags-ID: V03:K0:QCoxaClC0NPLPe+J8U32NkksmLUaA7450cEi+Jx52ivMofPO8P5 uuHrnf7ylVa8K13IlJin16zlWIXuf4p/yQ593aKHGwE8X0bE3BNIwpIQbxl5iCIjiRzuWvj qzHM3o1G34hHfrYGegxC7Xe9l5bmR8URnk/ZH1U68jwHjq34UhMTFiQPqui6WjfkcGEjhev UQHfHbLvJrhi9MT63UJHw== X-UI-Out-Filterresults: notjunk:1;V01:K0:DxLOwU1qBDE=:gXXYyu8PJN3PFb+77B7uXr UusAF45SgKp7pl7xAzn2KgcD57e8m1D1zPuUnjn0qu6V2RAvVDjSqf7p6fVbKyhndbx01QFTw V+dXNibVST7vZ7FjzobzZLnhWNJgM9mnILmGG74vzaHczpQBvpNtiqoVaZDSSDfHKKVaOX7VK g5qdWMW54T3AV1MFwv+W3K7HzHjPmn1xpMHJXd7m/pjP2DHzro5mOF7E769y/a6Q4U3Thda7J wvBSGpUznDSQFNqZlyN73L8WpbedaKTbwMu/6UTWTFGi0bPc6HNsepMnxDTQrNrmcJOdmIKv2 yig4E680gLh/5bKJfujfwwRcZDLSyhBsXs3TdKAgzzXATrW1/Lg2Z74uf1If2wU52RiZxY/om kLiNRKUGz7jTvax+A6fd0//q4PaCj4gnV2NFp//chR4ToD4sk4EViSNYXCQ2RXuX1OxGd0oV3 0DLAttGqb7g+9xvcy9kypA64eo5I4Kwe581UbWDnFQj8yAxk6MvaSRHF41iwMMf9r0QQcR1dL 2cipfCPp2zD8Hi3g93kSlBtOU6l57m8lVApTghQkPyzffajBY7QjY/Azfy5xqq5kHfvL0Eg48 i8zf+poa5BSG6qkV/EP/z47rhAB1xYWP/7jdHxST74wmSNRSWfKACFDfhTnFZ3l5JKZ4Z1u5w IRQBwPDMb17m2YHQR4f1ILGmgED503oMVMLgn4Jltrndcl+tPSkhFSCAFjBmOqX55QqOz5qcD sBlNI82vX9C7H/e5u14XJA5MxOAFa5pdtu1aYFH5usnbBSQpLWkKhH8TgXEVrg0mwG04EO5mR 2Mh9ziWN9YC85Vu6UErIhuDzu2Ue54jP17Uyj7tILqMpOe8XGU= Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, 2018-01-22 at 20:27 +0800, Wanpeng Li wrote: > 2018-01-22 20:08 GMT+08:00 Mike Galbraith : > > On Mon, 2018-01-22 at 19:47 +0800, Wanpeng Li wrote: > >> Hi all, > >> > >> We can observe unixbench context switch performance is heavily > >> influenced by cpu topology which is exposed to the guest. the score is > >> posted below, bigger is better, both the guest and the host kernel are > >> 3.15-rc3(we can also reproduce against centos 7.4 693 guest/host), LLC > >> is exposed to the guest, kvm adaptive halt-polling is default enabled, > >> then start a guest w/ 8 logical cpus. > >> > >> > >> > >> unixbench context switch > >> -smp 8, sockets=8, cores=1, threads=1 382036 > >> -smp 8, sockets=4, cores=2, threads=1 132480 > >> -smp 8, sockets=2, cores=4, threads=1 128032 > >> -smp 8, sockets=2, cores=2, threads=2 131767 > >> -smp 8, sockets=1, cores=4, threads=2 132742 > >> -smp 8, sockets=1, cores=4, threads=2 (guest w/ nohz=off idle=poll) 331471 > >> > >> I can observe there are a lot of reschedule IPIs sent from one vCPU to > >> another vCPU, the context switch workload switches between running and > >> idle frequently which results in HLT instruction in the idle path, I > >> use idle=poll to avoid vmexit due to HLT and to avoid reschedule IPIs > >> since idle task checks TIF_NEED_RESCHED flags in a loop, nohz=off can > >> stop to program lapic timer/other nohz stuffs. Any idea why sockets=8 > >> can get best performance? > > > > Probably because with that topology, there is no shared llc, thus no > > cross-core scheduling, micro-benchmark waker/wakee are stacked. If > > your benchmark does nothing but schedule, stacking makes beautiful (but > > utterly meaningless) numbers. > > The waker and wakee are just sporadic on the same logical cpu in the > guest(-smp 8, sockets=8, cores=1, threads=1) during the testing, in > addition, binding the waker/wakee to one logical cpu in the guest(-smp > 8, sockets=1, cores=4, threads=2) also can get the performance as > better as 8 sockets setup. Here, with tip.today and that topology, context1 does stack up on one core. PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ P COMMAND 4218 root 20 0 4048 808 732 R 52.16 0.022 0:12.77 4 context1 4219 root 20 0 4048 80 0 S 47.18 0.002 0:11.96 4 context1 There's a bit of bouncing, but the two stack right back up. ?But whatever, what Peter said, the benchmark should pin itself to do this. -Mike