Received: by 2002:a25:c593:0:0:0:0:0 with SMTP id v141csp871919ybe; Wed, 11 Sep 2019 06:07:50 -0700 (PDT) X-Google-Smtp-Source: APXvYqxflBTARZhtV7dh/y/HhjnrCt9LCMGwBFarOFawsJ4V7ZItv5S66uaumYIQgyRonvI0CKXH X-Received: by 2002:a17:906:d0c5:: with SMTP id bq5mr30061321ejb.95.1568207270594; Wed, 11 Sep 2019 06:07:50 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1568207270; cv=none; d=google.com; s=arc-20160816; b=eDpQm6gJIi3PRQQAd5MoMhZI7MFOPTEpktWaquXAGn5z+ANxDEmjyOO2uyJqQaIklG I8SG4CiiMMJ7Gd3y5HXaCnn+wDtnPavd4UrdSsI8nf4+6DBbpdsHsN2ZRsTDf7orq8fE vxYnaGGjmlTixYVklipq6NrBpaSmpvo8V4FkAptV34wvoR5CXoT6yV/+6V+X1XPb6R16 DghwWeH5A8Nuga2gEIeTFIIAxBRYvabVXhI2CX+JHe23WGukRPk9m4qhZMn4pdI6zhiS E3PoYWdT1Ww9k9S8B+jkiZcEQ7PeWFOuD6pEvNCkIqXvAa+YH0Ar1sKdtsqrm03W3lCb 6X6g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:openpgp:from:references:cc:to:subject; bh=5cZwRFkPnBfvd9KKrvlsF0v7o7pc2TdTfWhZUtnofjs=; b=hJTVjGhVv7cBc/xxmHRy6sfleLpns1viMR7vyEL2Q+0QnEjhbHKq0xqdM1AHt6VeI1 NzqW75EM6FuI8M1WVh5POKpvkeFCB+7oLzkrhSI5pz3GlyfYyHr5z6aMftTaH9A/D6dg FKdaBPWlCGvbHIcERBKPrMFmW59bOd3lwtMon5bQpUUOb+wrnPXod9jJ+mch+kI2jlUr 9be5y1v5QORMYAwZXIiljS1afl6ghXw8TarLP99lYYMohFk3JJ9yr9YNnhYNk/mkdOUv OOc0ZIG8Q7usBXPkExYynTp56wV6PXLPgA3IoNl2krtZufNZE09Da3QQyif8rylcFMsr LUkA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id t10si6914124eds.369.2019.09.11.06.07.26; Wed, 11 Sep 2019 06:07:50 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728084AbfIKNEh (ORCPT + 99 others); Wed, 11 Sep 2019 09:04:37 -0400 Received: from mx1.redhat.com ([209.132.183.28]:52812 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726954AbfIKNEh (ORCPT ); Wed, 11 Sep 2019 09:04:37 -0400 Received: from mail-wr1-f72.google.com (mail-wr1-f72.google.com [209.85.221.72]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 1BBDF11A1E for ; Wed, 11 Sep 2019 13:04:36 +0000 (UTC) Received: by mail-wr1-f72.google.com with SMTP id n6so10393154wrm.20 for ; Wed, 11 Sep 2019 06:04:36 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:openpgp:message-id :date:user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=5cZwRFkPnBfvd9KKrvlsF0v7o7pc2TdTfWhZUtnofjs=; b=Xu3JiAb7XaRMBRMe7Ukvgso0V5WHituGO+izEFFZZrmEDetDkdjpbaJTYv0I3LW0nJ Qpbz8DQ38WuYo3OohILp4qRUGlS16xigB8fnzkr8DpU9DbEL0U5Rl97dputLzHAVOsB7 yesj1yiPkmkhUT3UiHz3GAlFDPFGPwtBHa27EBnLs46kUgKsB2sKeUvpiSSPy13urONl SIMDQV2fEeyOZMxuAH35YPQmoSXmg5oqTWZ4ygmlV0LD3wGCsDc+jV6+kGbl+5OjweYy A1cs4BWBfn2CQZQZ61PrnUCOug231aVHsFEmLSNIVFkCUITI2ALhcQCLEUZdUp/qsEOi uOeA== X-Gm-Message-State: APjAAAW9HVahcPr7QSMbwpscEMpR96wU2+sasQYiTIw1wXTMJCTA5dA/ 0aPMLIQNypEYl/3fUbNal2/mCTPM8G3ddgs8bgWEKGWZLb2zmYz5W3EccCdIT8hgDEzpdjFwV8t MSDUrdEDeLDLizCI2v1UakgQw X-Received: by 2002:adf:f48e:: with SMTP id l14mr29136120wro.234.1568207074753; Wed, 11 Sep 2019 06:04:34 -0700 (PDT) X-Received: by 2002:adf:f48e:: with SMTP id l14mr29136106wro.234.1568207074435; Wed, 11 Sep 2019 06:04:34 -0700 (PDT) Received: from ?IPv6:2001:b07:6468:f312:102b:3795:6714:7df6? ([2001:b07:6468:f312:102b:3795:6714:7df6]) by smtp.gmail.com with ESMTPSA id y13sm42446736wrg.8.2019.09.11.06.04.33 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 11 Sep 2019 06:04:33 -0700 (PDT) Subject: Re: [PATCH] Revert "locking/pvqspinlock: Don't wait if vCPU is preempted" To: Waiman Long , Wanpeng Li Cc: LKML , kvm , =?UTF-8?B?UmFkaW0gS3LEjW3DocWZ?= , Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Peter Zijlstra , Thomas Gleixner , Ingo Molnar , loobinliu@tencent.com, "# v3 . 10+" References: <1567993228-23668-1-git-send-email-wanpengli@tencent.com> <29d04ee4-60e7-4df9-0c4f-fc29f2b0c6a8@redhat.com> <2dda32db-5662-f7a6-f52d-b835df1f45f1@redhat.com> From: Paolo Bonzini Openpgp: preference=signencrypt Message-ID: <9ef778df-c34a-897c-bcfa-780256fb78ff@redhat.com> Date: Wed, 11 Sep 2019 15:04:31 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.8.0 MIME-Version: 1.0 In-Reply-To: <2dda32db-5662-f7a6-f52d-b835df1f45f1@redhat.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 11/09/19 06:25, Waiman Long wrote: > On 9/10/19 6:56 AM, Wanpeng Li wrote: >> On Mon, 9 Sep 2019 at 18:56, Waiman Long wrote: >>> On 9/9/19 2:40 AM, Wanpeng Li wrote: >>>> From: Wanpeng Li >>>> >>>> This patch reverts commit 75437bb304b20 (locking/pvqspinlock: Don't wait if >>>> vCPU is preempted), we found great regression caused by this commit. >>>> >>>> Xeon Skylake box, 2 sockets, 40 cores, 80 threads, three VMs, each is 80 vCPUs. >>>> The score of ebizzy -M can reduce from 13000-14000 records/s to 1700-1800 >>>> records/s with this commit. >>>> >>>> Host Guest score >>>> >>>> vanilla + w/o kvm optimizes vanilla 1700-1800 records/s >>>> vanilla + w/o kvm optimizes vanilla + revert 13000-14000 records/s >>>> vanilla + w/ kvm optimizes vanilla 4500-5000 records/s >>>> vanilla + w/ kvm optimizes vanilla + revert 14000-15500 records/s >>>> >>>> Exit from aggressive wait-early mechanism can result in yield premature and >>>> incur extra scheduling latency in over-subscribe scenario. >>>> >>>> kvm optimizes: >>>> [1] commit d73eb57b80b (KVM: Boost vCPUs that are delivering interrupts) >>>> [2] commit 266e85a5ec9 (KVM: X86: Boost queue head vCPU to mitigate lock waiter preemption) >>>> >>>> Tested-by: loobinliu@tencent.com >>>> Cc: Peter Zijlstra >>>> Cc: Thomas Gleixner >>>> Cc: Ingo Molnar >>>> Cc: Waiman Long >>>> Cc: Paolo Bonzini >>>> Cc: Radim Krčmář >>>> Cc: loobinliu@tencent.com >>>> Cc: stable@vger.kernel.org >>>> Fixes: 75437bb304b20 (locking/pvqspinlock: Don't wait if vCPU is preempted) >>>> Signed-off-by: Wanpeng Li >>>> --- >>>> kernel/locking/qspinlock_paravirt.h | 2 +- >>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>> >>>> diff --git a/kernel/locking/qspinlock_paravirt.h b/kernel/locking/qspinlock_paravirt.h >>>> index 89bab07..e84d21a 100644 >>>> --- a/kernel/locking/qspinlock_paravirt.h >>>> +++ b/kernel/locking/qspinlock_paravirt.h >>>> @@ -269,7 +269,7 @@ pv_wait_early(struct pv_node *prev, int loop) >>>> if ((loop & PV_PREV_CHECK_MASK) != 0) >>>> return false; >>>> >>>> - return READ_ONCE(prev->state) != vcpu_running || vcpu_is_preempted(prev->cpu); >>>> + return READ_ONCE(prev->state) != vcpu_running; >>>> } >>>> >>>> /* >>> There are several possibilities for this performance regression: >>> >>> 1) Multiple vcpus calling vcpu_is_preempted() repeatedly may cause some >>> cacheline contention issue depending on how that callback is implemented. >>> >>> 2) KVM may set the preempt flag for a short period whenver an vmexit >>> happens even if a vmenter is executed shortly after. In this case, we >>> may want to use a more durable vcpu suspend flag that indicates the vcpu >>> won't get a real vcpu back for a longer period of time. >>> >>> Perhaps you can add a lock event counter to count the number of >>> wait_early events caused by vcpu_is_preempted() being true to see if it >>> really cause a lot more wait_early than without the vcpu_is_preempted() >>> call. >> pv_wait_again:1:179 >> pv_wait_early:1:189429 >> pv_wait_head:1:263 >> pv_wait_node:1:189429 >> pv_vcpu_is_preempted:1:45588 >> =========sleep 5============ >> pv_wait_again:1:181 >> pv_wait_early:1:202574 >> pv_wait_head:1:267 >> pv_wait_node:1:202590 >> pv_vcpu_is_preempted:1:46336 >> >> The sampling period is 5s, 6% of wait_early events caused by >> vcpu_is_preempted() being true. > > 6% isn't that high. However, when one vCPU voluntarily releases its > vCPU, all the subsequently waiters in the queue will do the same. It is > a cascading effect. Perhaps we wait early too aggressive with the > original patch. > > I also look up the email chain of the original commit. The patch > submitter did not provide any performance data to support this change. > The patch just looked reasonable at that time. So there was no > objection. Given that we now have hard evidence that this was not a good > idea. I think we should revert it. > > Reviewed-by: Waiman Long > > Thanks, > Longman > Queued, thanks. Paolo