Received: by 2002:a6b:fb09:0:0:0:0:0 with SMTP id h9csp4181104iog; Tue, 28 Jun 2022 10:35:09 -0700 (PDT) X-Google-Smtp-Source: AGRyM1uQEMwUYWlcvsQlsRpVy91eXywngWy6rtBlJbLdEKS7LG2zSg27IJqXBC/nzsTRI7riWeYQ X-Received: by 2002:a17:906:7790:b0:722:e6cf:126 with SMTP id s16-20020a170906779000b00722e6cf0126mr18115682ejm.244.1656437709610; Tue, 28 Jun 2022 10:35:09 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1656437709; cv=none; d=google.com; s=arc-20160816; b=Kw/cNucbZ2gu1gIQHLUiKc5kqVkD2eVINXxDI5qoo91UTCHGVPj1FvGf0YIkbvs8cY 8M/B37iH7WW2fRs+IpRZa4GlnGthm+RM2RBCbWzYSGI/y3jUhzABcJ9OQOD5XKw5uK+i SwfjZK1eFhEg9Z1D0diMyjIsE7XxQ2l9xAcZ66DDql/gCI0b33Api5zSMwW5OT8agBI/ PToYqsjT+ux9+9CeDIg39oqN88uP8amwA++zswLKyHK83PNYCeDpR9BeMWSrGPWaYRGg 2QC0lFbaM6csqrgMkFw6IUezFkveJyP4arXPTKjES9xGWSC0CDp+4LQbGklkTLOEOnrv 3W3w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id:dkim-signature; bh=BxBrwqAstyFfAs6v5/7UNTyZeq12M0ezD0pRqc5c/pY=; b=N4qPdp/Hzw0y4UZ/HwYkzNat2z2G95nSt4CDcCmxfeSWgWbdNrILFXDBmtpsMQBPuu TUdac4X79zoXmj8e5gyo3XsDyJDII99KRPT21zgVTm3AgoM36K3JdkWs/9KYeY3w2G/+ MKSiHCx6kEO0WaWi6AJVB1i2z+dI4OfA4MHAle+jvPUsKK4fdQZ6YjrAuvgEP/fGaWUP 47C4g6sI3CtOOCC766id7uGkK6xDUzDA9+GKdXun2kwyPJvHrxq5DviI3KFDSVJXtyPQ GltzBheYQ43ncizqGUd/cpEd1bxhtv0o6ki2JZI3Pu92yrJFz7rnkBorjVXtvwBVATP/ beig== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=bGPBRtMV; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id hp2-20020a1709073e0200b00711f646bb8csi20257030ejc.916.2022.06.28.10.34.41; Tue, 28 Jun 2022 10:35:09 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=bGPBRtMV; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233193AbiF1R2c (ORCPT + 99 others); Tue, 28 Jun 2022 13:28:32 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43742 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232209AbiF1R2b (ORCPT ); Tue, 28 Jun 2022 13:28:31 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 75FA537A1F for ; Tue, 28 Jun 2022 10:28:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1656437308; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=BxBrwqAstyFfAs6v5/7UNTyZeq12M0ezD0pRqc5c/pY=; b=bGPBRtMVOFWzx4deh0xIhyL87jpvnj5gGJy+c2ysa9frm1QpFUOdcYLrYqjU1dxSAg4YpV 7WHpAG9uDhSlmFlv4XQswAvfQ2sdwxbI3L0yyPsZ3a6bp9n5OKP1ioALLykdNBsJJ83q2n JK680OIqlKhTSOXHEH2cMOcwtFvl00o= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-197-ZhCWIlzqNg2J2U3oJdHLiw-1; Tue, 28 Jun 2022 13:28:27 -0400 X-MC-Unique: ZhCWIlzqNg2J2U3oJdHLiw-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 590FC1035340; Tue, 28 Jun 2022 17:28:26 +0000 (UTC) Received: from [10.22.34.187] (unknown [10.22.34.187]) by smtp.corp.redhat.com (Postfix) with ESMTP id 74CA42166B26; Tue, 28 Jun 2022 17:28:25 +0000 (UTC) Message-ID: Date: Tue, 28 Jun 2022 13:28:25 -0400 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.9.1 Subject: Re: [PATCH v7] x86/paravirt: useless assignment instructions cause Unixbench full core performance degradation Content-Language: en-US To: Guo Hui , peterz@infradead.org Cc: jgross@suse.com, srivatsa@csail.mit.edu, amakhalov@vmware.com, pv-drivers@vmware.com, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org, hpa@zytor.com, will@kernel.org, boqun.feng@gmail.com, virtualization@lists.linux-foundation.org, wangxiaohua@uniontech.com, linux-kernel@vger.kernel.org References: <588a3276-5481-0a9f-9eac-fed09eede4f2@redhat.com> <20220628161251.21950-1-guohui@uniontech.com> From: Waiman Long In-Reply-To: <20220628161251.21950-1-guohui@uniontech.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.78 on 10.11.54.6 X-Spam-Status: No, score=-3.2 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A, RCVD_IN_DNSWL_LOW,SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 6/28/22 12:12, Guo Hui wrote: > The instructions assigned to the vcpu_is_preempted function parameter > in the X86 architecture physical machine are redundant instructions, > causing the multi-core performance of Unixbench to drop by about 4% to 5%. > The C function is as follows: > static bool vcpu_is_preempted(long vcpu); > > The parameter 'vcpu' in the function osq_lock > that calls the function vcpu_is_preempted is assigned as follows: > > The C code is in the function node_cpu: > cpu = node->cpu - 1; > > The instructions corresponding to the C code are: > mov 0x14(%rax),%edi > sub $0x1,%edi > > The above instructions are unnecessary > in the X86 Native operating environment, > causing high cache-misses and degrading performance. > > This patch uses static_key to not execute this instruction > in the Native runtime environment. > > The patch effect is as follows two machines, > Unixbench runs with full core score: > > 1. Machine configuration: > Intel(R) Xeon(R) Silver 4210 CPU @ 2.20GHz > CPU core: 40 > Memory: 256G > OS Kernel: 5.19-rc3 > > Before using the patch: > System Benchmarks Index Values BASELINE RESULT INDEX > Dhrystone 2 using register variables 116700.0 948326591.2 81261.9 > Double-Precision Whetstone 55.0 211986.3 38543.0 > Execl Throughput 43.0 43453.2 10105.4 > File Copy 1024 bufsize 2000 maxblocks 3960.0 438936.2 1108.4 > File Copy 256 bufsize 500 maxblocks 1655.0 118197.4 714.2 > File Copy 4096 bufsize 8000 maxblocks 5800.0 1534674.7 2646.0 > Pipe Throughput 12440.0 46482107.6 37365.0 > Pipe-based Context Switching 4000.0 1915094.2 4787.7 > Process Creation 126.0 85442.2 6781.1 > Shell Scripts (1 concurrent) 42.4 69400.7 16368.1 > Shell Scripts (8 concurrent) 6.0 8877.2 14795.3 > System Call Overhead 15000.0 4714906.1 3143.3 > ======== > System Benchmarks Index Score 7923.3 > > After using the patch: > System Benchmarks Index Values BASELINE RESULT INDEX > Dhrystone 2 using register variables 116700.0 947032915.5 81151.1 > Double-Precision Whetstone 55.0 211971.2 38540.2 > Execl Throughput 43.0 45054.8 10477.9 > File Copy 1024 bufsize 2000 maxblocks 3960.0 515024.9 1300.6 > File Copy 256 bufsize 500 maxblocks 1655.0 146354.6 884.3 > File Copy 4096 bufsize 8000 maxblocks 5800.0 1679995.9 2896.5 > Pipe Throughput 12440.0 46466394.2 37352.4 > Pipe-based Context Switching 4000.0 1898221.4 4745.6 > Process Creation 126.0 85653.1 6797.9 > Shell Scripts (1 concurrent) 42.4 69437.3 16376.7 > Shell Scripts (8 concurrent) 6.0 8898.9 14831.4 > System Call Overhead 15000.0 4658746.7 3105.8 > ======== > System Benchmarks Index Score 8248.8 > > 2. Machine configuration: > Hygon C86 7185 32-core Processor > CPU core: 128 > Memory: 256G > OS Kernel: 5.19-rc3 > > Before using the patch: > System Benchmarks Index Values BASELINE RESULT INDEX > Dhrystone 2 using register variables 116700.0 2256644068.3 193371.4 > Double-Precision Whetstone 55.0 438969.9 79812.7 > Execl Throughput 43.0 10108.6 2350.8 > File Copy 1024 bufsize 2000 maxblocks 3960.0 275892.8 696.7 > File Copy 256 bufsize 500 maxblocks 1655.0 72082.7 435.5 > File Copy 4096 bufsize 8000 maxblocks 5800.0 925043.4 1594.9 > Pipe Throughput 12440.0 118905512.5 95583.2 > Pipe-based Context Switching 4000.0 7820945.7 19552.4 > Process Creation 126.0 31233.3 2478.8 > Shell Scripts (1 concurrent) 42.4 49042.8 11566.7 > Shell Scripts (8 concurrent) 6.0 6656.0 11093.3 > System Call Overhead 15000.0 6816047.5 4544.0 > ======== > System Benchmarks Index Score 7756.6 > > After using the patch: > System Benchmarks Index Values BASELINE RESULT INDEX > Dhrystone 2 using register variables 116700.0 2252272929.4 192996.8 > Double-Precision Whetstone 55.0 451847.2 82154.0 > Execl Throughput 43.0 10595.1 2464.0 > File Copy 1024 bufsize 2000 maxblocks 3960.0 301279.3 760.8 > File Copy 256 bufsize 500 maxblocks 1655.0 79291.3 479.1 > File Copy 4096 bufsize 8000 maxblocks 5800.0 1039755.2 1792.7 > Pipe Throughput 12440.0 118701468.1 95419.2 > Pipe-based Context Switching 4000.0 8073453.3 20183.6 > Process Creation 126.0 33440.9 2654.0 > Shell Scripts (1 concurrent) 42.4 52722.6 12434.6 > Shell Scripts (8 concurrent) 6.0 7050.4 11750.6 > System Call Overhead 15000.0 6834371.5 4556.2 > ======== > System Benchmarks Index Score 8157.8 > > Signed-off-by: Guo Hui > --- > arch/x86/kernel/paravirt-spinlocks.c | 4 ++++ > kernel/locking/osq_lock.c | 19 ++++++++++++++++++- > 2 files changed, 22 insertions(+), 1 deletion(-) > > diff --git a/arch/x86/kernel/paravirt-spinlocks.c b/arch/x86/kernel/paravirt-spinlocks.c > index 9e1ea99ad..a2eb375e2 100644 > --- a/arch/x86/kernel/paravirt-spinlocks.c > +++ b/arch/x86/kernel/paravirt-spinlocks.c > @@ -33,6 +33,8 @@ bool pv_is_native_vcpu_is_preempted(void) > __raw_callee_save___native_vcpu_is_preempted; > } > > +DECLARE_STATIC_KEY_TRUE(vcpu_has_preemption); > + > void __init paravirt_set_cap(void) > { > if (!pv_is_native_spin_unlock()) > @@ -40,4 +42,6 @@ void __init paravirt_set_cap(void) > > if (!pv_is_native_vcpu_is_preempted()) > setup_force_cpu_cap(X86_FEATURE_VCPUPREEMPT); > + else > + static_branch_disable(&vcpu_has_preemption); > } > diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c > index d5610ad52..f521b0f6d 100644 > --- a/kernel/locking/osq_lock.c > +++ b/kernel/locking/osq_lock.c > @@ -27,6 +27,23 @@ static inline int node_cpu(struct optimistic_spin_node *node) > return node->cpu - 1; > } > > +#ifdef vcpu_is_preempted > +DEFINE_STATIC_KEY_TRUE(vcpu_has_preemption); > + > +static inline bool vcpu_is_preempted_node(struct optimistic_spin_node *node) > +{ > + if (static_branch_likely(&vcpu_has_preemption)) > + return vcpu_is_preempted(node_cpu(node->prev)); > + > + return false; > +} > +#else > +static inline bool vcpu_is_preempted_node(struct optimistic_spin_node *node) > +{ > + return false; > +} > +#endif > + > static inline struct optimistic_spin_node *decode_cpu(int encoded_cpu_val) > { > int cpu_nr = encoded_cpu_val - 1; > @@ -141,7 +158,7 @@ bool osq_lock(struct optimistic_spin_queue *lock) > * polling, be careful. > */ > if (smp_cond_load_relaxed(&node->locked, VAL || need_resched() || > - vcpu_is_preempted(node_cpu(node->prev)))) > + vcpu_is_preempted_node(node))) > return true; > > /* unqueue */ Reviewed-by: Waiman Long