Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp2210425pxb; Mon, 8 Mar 2021 18:05:36 -0800 (PST) X-Google-Smtp-Source: ABdhPJwfom6kgTi3j65dEFhaOLlYQECDgtBon62b/d5DefRcgEnP7HMbJ6+DcRSymfTHHCCqSCN7 X-Received: by 2002:a17:906:2710:: with SMTP id z16mr17292837ejc.176.1615255535951; Mon, 08 Mar 2021 18:05:35 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1615255535; cv=none; d=google.com; s=arc-20160816; b=LLZSOC4llQNdc/80HkXSWzpebhDqUGqokeebfMEXb1kcoPkEzWoOyec3VENvMBqoFT IMufM29DzWsEX2OBPRK8CXERA6AEP1RevES/AOZTvnW/pyX3YJHHGEBLq7huMny4lpT3 QTvW8AcG5RYZfJiVg61/S/jjKOQcgwIcfR1BwiB+q2GY47CpRc2gVxZXxwusqRwq6PJY HWWiSru2IFzCs8bRSt0towxTBv6fYrpg7U463M1RG52sr+/LNMgt8TOV35c5P9LknHQY sNt9aZlj4isINvQKy4Y5nd1CRSX6ZAQAOuISn0Ej9Q3i4juU5KUlvBVqMOVmYnlI2L+l 9cLA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=kjERF7sWMg8rVHp42fiWAiodxZM5r211yHxIaG8rEvM=; b=D3d/hztWq2GnbG+PnamsuHOsupzTi4evcCHiU5DgFsO7y1euY5UMs8o1vRzOAWOENB S3KP9eMxcYqDlkzPtq+jAqhYwu85tIjUileZ+SKZKP3ks87kdYkv9t8/LB5mb/PM22ls BkZ8sZQBVr55tU2PU6iPjHvLg3S/qfW0rOk6mTa2gzaFubdF3g9e+x0G4AsHiNDtJa5u 9NXVqMAhOOf+RPoxIyCmkZiGNIql9kj8X3cT0nGnfxMXcpmlllwL7mcaxbN5ypg3Gwgg ngB4BHD0e+GT5k9qKLoe9Tl1vLoZl2B88HlO2neMaGnM05KOgD7+j62OKSsV+XO1fDcr kThA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id k25si8212365ejq.186.2021.03.08.18.05.13; Mon, 08 Mar 2021 18:05:35 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230387AbhCICAV (ORCPT + 99 others); Mon, 8 Mar 2021 21:00:21 -0500 Received: from mx2.suse.de ([195.135.220.15]:41506 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230177AbhCICAL (ORCPT ); Mon, 8 Mar 2021 21:00:11 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 51F12AC17; Tue, 9 Mar 2021 02:00:10 +0000 (UTC) From: Davidlohr Bueso To: npiggin@gmail.com Cc: peterz@infradead.org, mingo@redhat.com, will@kernel.org, longman@redhat.com, mpe@ellerman.id.au, benh@kernel.crashing.org, paulus@samba.org, linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, dave@stgolabs.net, Davidlohr Bueso Subject: [PATCH 3/3] powerpc/qspinlock: Use generic smp_cond_load_relaxed Date: Mon, 8 Mar 2021 17:59:50 -0800 Message-Id: <20210309015950.27688-4-dave@stgolabs.net> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210309015950.27688-1-dave@stgolabs.net> References: <20210309015950.27688-1-dave@stgolabs.net> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 49a7d46a06c3 (powerpc: Implement smp_cond_load_relaxed()) added busy-waiting pausing with a preferred SMT priority pattern, lowering the priority (reducing decode cycles) during the whole loop slowpath. However, data shows that while this pattern works well with simple spinlocks, queued spinlocks benefit more being kept in medium priority, with a cpu_relax() instead, being a low+medium combo on powerpc. Data is from three benchmarks on a Power9: 9008-22L 64 CPUs with 2 sockets and 8 threads per core. 1. locktorture. This is data for the lowest and most artificial/pathological level, with increasing thread counts pounding on the lock. Metrics are total ops/minute. Despite some small hits in the 4-8 range, scenarios are either neutral or favorable to this patch. +=========+==========+==========+=======+ | # tasks | vanilla | dirty | %diff | +=========+==========+==========+=======+ | 2 | 46718565 | 48751350 | 4.35 | +---------+----------+----------+-------+ | 4 | 51740198 | 50369082 | -2.65 | +---------+----------+----------+-------+ | 8 | 63756510 | 62568821 | -1.86 | +---------+----------+----------+-------+ | 16 | 67824531 | 70966546 | 4.63 | +---------+----------+----------+-------+ | 32 | 53843519 | 61155508 | 13.58 | +---------+----------+----------+-------+ | 64 | 53005778 | 53104412 | 0.18 | +---------+----------+----------+-------+ | 128 | 53331980 | 54606910 | 2.39 | +=========+==========+==========+=======+ 2. sockperf (tcp throughput) Here a client will do one-way throughput tests to a localhost server, with increasing message sizes, dealing with the sk_lock. This patch shows to put the performance of the qspinlock back to par with that of the simple lock: simple-spinlock vanilla dirty Hmean 14 73.50 ( 0.00%) 54.44 * -25.93%* 73.45 * -0.07%* Hmean 100 654.47 ( 0.00%) 385.61 * -41.08%* 771.43 * 17.87%* Hmean 300 2719.39 ( 0.00%) 2181.67 * -19.77%* 2666.50 * -1.94%* Hmean 500 4400.59 ( 0.00%) 3390.77 * -22.95%* 4322.14 * -1.78%* Hmean 850 6726.21 ( 0.00%) 5264.03 * -21.74%* 6863.12 * 2.04%* 3. dbench (tmpfs) Configured to run with up to ncpusx8 clients, it shows both latency and throughput metrics. For the latency, with the exception of the 64 case, there is really nothing to go by: vanilla dirty Amean latency-1 1.67 ( 0.00%) 1.67 * 0.09%* Amean latency-2 2.15 ( 0.00%) 2.08 * 3.36%* Amean latency-4 2.50 ( 0.00%) 2.56 * -2.27%* Amean latency-8 2.49 ( 0.00%) 2.48 * 0.31%* Amean latency-16 2.69 ( 0.00%) 2.72 * -1.37%* Amean latency-32 2.96 ( 0.00%) 3.04 * -2.60%* Amean latency-64 7.78 ( 0.00%) 8.17 * -5.07%* Amean latency-512 186.91 ( 0.00%) 186.41 * 0.27%* For the dbench4 Throughput (misleading but traditional) there's a small but rather constant improvement: vanilla dirty Hmean 1 849.13 ( 0.00%) 851.51 * 0.28%* Hmean 2 1664.03 ( 0.00%) 1663.94 * -0.01%* Hmean 4 3073.70 ( 0.00%) 3104.29 * 1.00%* Hmean 8 5624.02 ( 0.00%) 5694.16 * 1.25%* Hmean 16 9169.49 ( 0.00%) 9324.43 * 1.69%* Hmean 32 11969.37 ( 0.00%) 12127.09 * 1.32%* Hmean 64 15021.12 ( 0.00%) 15243.14 * 1.48%* Hmean 512 14891.27 ( 0.00%) 15162.11 * 1.82%* Measuring the dbench4 Per-VFS Operation latency, shows some very minor differences within the noise level, around the 0-1% ranges. Signed-off-by: Davidlohr Bueso --- arch/powerpc/include/asm/barrier.h | 16 ---------------- 1 file changed, 16 deletions(-) diff --git a/arch/powerpc/include/asm/barrier.h b/arch/powerpc/include/asm/barrier.h index aecfde829d5d..7ae29cfb06c0 100644 --- a/arch/powerpc/include/asm/barrier.h +++ b/arch/powerpc/include/asm/barrier.h @@ -80,22 +80,6 @@ do { \ ___p1; \ }) -#ifdef CONFIG_PPC64 -#define smp_cond_load_relaxed(ptr, cond_expr) ({ \ - typeof(ptr) __PTR = (ptr); \ - __unqual_scalar_typeof(*ptr) VAL; \ - VAL = READ_ONCE(*__PTR); \ - if (unlikely(!(cond_expr))) { \ - spin_begin(); \ - do { \ - VAL = READ_ONCE(*__PTR); \ - } while (!(cond_expr)); \ - spin_end(); \ - } \ - (typeof(*ptr))VAL; \ -}) -#endif - #ifdef CONFIG_PPC_BOOK3S_64 #define NOSPEC_BARRIER_SLOT nop #elif defined(CONFIG_PPC_FSL_BOOK3E) -- 2.26.2