Received: by 2002:a05:6a10:17d3:0:0:0:0 with SMTP id hz19csp604475pxb; Thu, 15 Apr 2021 02:06:06 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyh7XtzL79cWNLVr74aMBkOk3kNf40OzQb4hipedODYYL8HDwSQ9i+3uXW9/R+uoG3/ilvU X-Received: by 2002:a17:906:8591:: with SMTP id v17mr2421748ejx.260.1618477566738; Thu, 15 Apr 2021 02:06:06 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1618477566; cv=none; d=google.com; s=arc-20160816; b=u5fLUZ6j6GuVIzVu8+UCcuSetk9CE97J+Z9PDAdgxevsp2M6MC9vIYjrN3s6N/3M/7 eHRi+eAZUnn04a+r9H410tnp7gpmITRcWCus/TVR9Ad3oQXUh2gY5NT9lsp/az1Suk9j 1ppTtdCp74hVpzPbAtou4KT0QxmvvHn8Hqmj+W0OT95LBQQe2iiMWGae+fBzB5KeOy6t kb43kLm/ys/c6qw0aHdBP6AbdhBUS+ai24WqNMHUD7enbBd6lTt5qlpwl0IK/fu9Vmek rBstJpn+Lqzk0Ukd4+wiVW+m5iCSP4bdezbBrM9SvFV0VLnB4cN5HU+V+cDzsUwgF/W4 HzfQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=evt1kWksji3l/SZ6+C7QInGvTjVXTZMMFTL2A+YHtlo=; b=FB5fOEatrlKursuRRoFaxP0Z90vBDmW8MdqeuKfKlC4fPZpD+Pl9jpsO8TsXJjyL10 YeGiTztkWygtrM5SBN6fLaEfJyGAEY51TfrTATA2q99a/QIY8NwHu2j2nD3JUFhYj0j5 ilZH6YKH2lMpxGr38AXf2ZA3+WrYiva5QhjbeLHCW3W0tLSze2rSsnahAvtmW3ZhQMD4 5nxtLzlBTWs/rxKDhUxxvdkf4tce75AlBHm8WAYqtm45EdFY8budv8G6EEUDi3DW4riK 0aLkFLf/dUdiRdMgMz9GJLvLUoZI3CoQTiScrZ4FuBBOcmXfiZ5z406mCKpmdfrJmZkw Oc1w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id b3si1629519ejc.453.2021.04.15.02.05.43; Thu, 15 Apr 2021 02:06:06 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231917AbhDOJCt (ORCPT + 99 others); Thu, 15 Apr 2021 05:02:49 -0400 Received: from mail.kernel.org ([198.145.29.99]:41048 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231809AbhDOJCq (ORCPT ); Thu, 15 Apr 2021 05:02:46 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id CE781611F1; Thu, 15 Apr 2021 09:02:20 +0000 (UTC) Date: Thu, 15 Apr 2021 10:02:18 +0100 From: Catalin Marinas To: Peter Zijlstra Cc: Stafford Horne , Guo Ren , Christoph =?iso-8859-1?Q?M=FCllner?= , Palmer Dabbelt , Anup Patel , linux-riscv , Linux Kernel Mailing List , Guo Ren , Will Deacon , Arnd Bergmann , jonas@southpole.se, stefan.kristiansson@saunalahti.fi Subject: Re: [RFC][PATCH] locking: Generic ticket-lock Message-ID: <20210415090215.GA1015@arm.com> References: <20210414204734.GJ3288043@lianli.shorne-pla.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org (fixed Will's email address) On Thu, Apr 15, 2021 at 10:09:54AM +0200, Peter Zijlstra wrote: > On Thu, Apr 15, 2021 at 05:47:34AM +0900, Stafford Horne wrote: > > > How's this then? Compile tested only on openrisc/simple_smp_defconfig. > > > > I did my testing with this FPGA build SoC: > > > > https://github.com/stffrdhrn/de0_nano-multicore > > > > Note, the CPU timer sync logic uses mb() and is a bit flaky. So missing mb() > > might be a reason. I thought we had defined mb() and l.msync, but it seems to > > have gotten lost. > > > > With that said I could test out this ticket-lock implementation. How would I > > tell if its better than qspinlock? > > Mostly if it isn't worse, it's better for being *much* simpler. As you > can see, the guts of ticket is like 16 lines of C (lock+unlock) and you > only need the behaviour of atomic_fetch_add() to reason about behaviour > of the whole thing. qspinlock OTOH is mind bending painful to reason > about. > > There are some spinlock tests in locktorture; but back when I had a > userspace copy of the lot and would measure min,avg,max acquire times > under various contention loads (making sure to only run a single task > per CPU etc.. to avoid lock holder preemption and other such 'fun' > things). > > It took us a fair amount of work to get qspinlock to compete with ticket > for low contention cases (by far the most common in the kernel), and it > took a fairly large amount of CPUs for qspinlock to really win from > ticket on the contended case. Your hardware may vary. In particular the > access to the external cacheline (for queueing, see the queue: label in > queued_spin_lock_slowpath) is a pain-point and the relative cost of > cacheline misses for your arch determines where (and if) low contention > behaviour is competitive. > > Also, less variance (the reason for the min/max measure) is better. > Large variance is typically a sign of fwd progress trouble. IIRC, one issue we had with ticket spinlocks on arm64 was on big.LITTLE systems where the little CPUs were always last to get a ticket when racing with the big cores. That was with load/store exclusives (LR/SC style) and would have probably got better with atomics but we moved to qspinlocks eventually (the Juno board didn't have atomics). (leaving the rest of the text below for Will's convenience) > That's not saying that qspinlock isn't awesome, but I'm arguing that you > should get there by first trying all the simpler things. By gradually > increasing complexity you can also find the problem spots (for your > architecture) and you have something to fall back to in case of trouble. > > Now, the obvious selling point of qspinlock is that due to the MCS style > nature of the thing it doesn't bounce the lock around, but that comes at > a cost of having to use that extra cacheline (due to the kernel liking > sizeof(spinlock_t) == sizeof(u32)). But things like ARM64's WFE (see > smp_cond_load_acquire()) can shift the balance quite a bit on that front > as well (ARM has a similar thing but less useful, see it's spinlock.h > and look for wfe() and dsb_sev()). > > Once your arch hits NUMA, qspinlock is probably a win. However, low > contention performance is still king for most workloads. Better high > contention behaviour is nice. -- Catalin