Received: by 2002:a25:8b91:0:0:0:0:0 with SMTP id j17csp547576ybl; Thu, 23 Jan 2020 03:37:31 -0800 (PST) X-Google-Smtp-Source: APXvYqzJ42u/CzqCAvot1nZ/noH1quDPq33Ge3vJZxNm8/7OyZoN43ejuZn3R02te5HOGMXq+kfE X-Received: by 2002:a9d:7618:: with SMTP id k24mr10902717otl.65.1579779451687; Thu, 23 Jan 2020 03:37:31 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1579779451; cv=none; d=google.com; s=arc-20160816; b=dBDzaWu5sRq2gjgYP98Was4eMsXjEyx+Ya+tudu+FFkK841pfmAvk5YiYO2ezCNh7/ wi2fDPQm+3eN4c1AQAg9h2Dng1ZAUl0GL7hdXv4+O1Cgv8U/wp71rbCm3nD5gZAUYvhu 4eI1BtpP9LFfL8kHMHNVRxbUewpOYRIMjgirFbLfQDL3D+eAH/FmGCVLw7xERil5/EQ8 tZLPLnE8URFZ4IHTSO9D4v2LecQwbeUsHx9lLQqofa5c/3VVkTrm1/ShwlOuct0G/FMs MX0CKwQItzbNY9+F7bXFDz63xlQu6Dd6S8lUs8hIKew5IKTv8KeroceOw0lumzbM7Lhv t8RA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=aOHzvHLy94XI6i5re7hZilV8tv0yTZ3mmMDE4pfTdrc=; b=z20vrBoVPyv5qGezL9pxHM2GIhpXh+0zrbMbX/tNevGocft6HwndbBXu1n9+9n1iWz yjak2b4yJafhdcwFC2IpElwSvjP5H+7vBHT62tkaZaQLfbkiBFEaupVEv0prS05Ui2fP hSb/GsYmrNiWwsgN5YhSXtSFPWlAtaIxlPLeydhkjof0fyy9sDdWPyagPjFRJ8aDdbcG 0r/erEt2UcOVIX6ODRutANdivV/Gl77q62SyFBJJYaqGrxJctCDW1eZ41v0S50FBddNq lSM6VPcHzPkGfQ4Mo5VrsafsY0TcPJUXVCojzjO2Cyr8tIcCDLYTQGBow5nlXgEywQgJ qFsg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=jjE8q2j+; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id v26si1043102otj.0.2020.01.23.03.37.19; Thu, 23 Jan 2020 03:37:31 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=jjE8q2j+; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729106AbgAWLfz (ORCPT + 99 others); Thu, 23 Jan 2020 06:35:55 -0500 Received: from mail.kernel.org ([198.145.29.99]:48480 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726191AbgAWLfz (ORCPT ); Thu, 23 Jan 2020 06:35:55 -0500 Received: from willie-the-truck (236.31.169.217.in-addr.arpa [217.169.31.236]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id DBD9224125; Thu, 23 Jan 2020 11:35:50 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1579779354; bh=NgRc4r3/WU3L+AaQZpJg9EMLS9NAgkFiIf8N0LyNC+Y=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=jjE8q2j+F9NR+6NUtDnw5ybuvoh3/gXOO538RDz1Cem8SjaEsHRMDFEkXXNJm+lm+ fkLMODHS28CRnHPazsbkpUlIDTtq7FnPNF9s0Uch7YSSCvgesocygv7IoYAlNFTQaB dJDXbBQRVrNqvf5CCLb60sDENtvFOlIBGjPB4vRk= Date: Thu, 23 Jan 2020 11:35:47 +0000 From: Will Deacon To: Waiman Long Cc: Lihao Liang , Alex Kogan , linux@armlinux.org.uk, peterz@infradead.org, mingo@redhat.com, will.deacon@arm.com, arnd@arndb.de, linux-arch@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, tglx@linutronix.de, bp@alien8.de, hpa@zytor.com, x86@kernel.org, guohanjun@huawei.com, jglauber@marvell.com, dave.dice@oracle.com, steven.sistare@oracle.com, daniel.m.jordan@oracle.com Subject: Re: [PATCH v9 0/5] Add NUMA-awareness to qspinlock Message-ID: <20200123113547.GD18991@willie-the-truck> References: <20200115035920.54451-1-alex.kogan@oracle.com> <4e15fa1d-9540-3274-502a-4195a0d46f63@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4e15fa1d-9540-3274-502a-4195a0d46f63@redhat.com> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi folks, (I think Lihao is travelling at the moment, so he may be delayed in his replies) On Wed, Jan 22, 2020 at 12:24:58PM -0500, Waiman Long wrote: > On 1/22/20 6:45 AM, Lihao Liang wrote: > > On Wed, Jan 22, 2020 at 10:28 AM Alex Kogan wrote: > >> Summary > >> ------- > >> > >> Lock throughput can be increased by handing a lock to a waiter on the > >> same NUMA node as the lock holder, provided care is taken to avoid > >> starvation of waiters on other NUMA nodes. This patch introduces CNA > >> (compact NUMA-aware lock) as the slow path for qspinlock. It is > >> enabled through a configuration option (NUMA_AWARE_SPINLOCKS). > >> > > Thanks for your patches. The experimental results look promising! > > > > I understand that the new CNA qspinlock uses randomization to achieve > > long-term fairness, and provides the numa_spinlock_threshold parameter > > for users to tune. As Linux runs extremely diverse workloads, it is not > > clear how randomization affects its fairness, and how users with > > different requirements are supposed to tune this parameter. > > > > To this end, Will and I consider it beneficial to be able to answer the > > following question: > > > > With different values of numa_spinlock_threshold and > > SHUFFLE_REDUCTION_PROB_ARG, how long do threads running on different > > sockets have to wait to acquire the lock? This is particularly relevant > > in high contention situations when new threads keep arriving on the same > > socket as the lock holder. > > > > In this email, I try to provide some formal analysis to address this > > question. Let's assume the probability for the lock to stay on the > > same socket is *at least* p, which corresponds to the probability for > > the function probably(unsigned int num_bits) in the patch to return *false*, > > where SHUFFLE_REDUCTION_PROB_ARG is passed as the value of num_bits to the > > function. > > That is not strictly true from my understanding of the code. The > probably() function does not come into play if a secondary queue is > present. Also calling cna_scan_main_queue() doesn't guarantee that a > waiter in the same node can be found. So the simple mathematical > analysis isn't that applicable in this case. One will have to do an > actual simulation to find out what the actual behavior will be. It's certainly true that the analysis is based on the worst-case scenario, but I think it's still worth considering. For example, the secondary queue does not exist initially so it seems a bit odd that we only instantiate it with < 1% probability. That said, my real concern with any of this is that it makes formal modelling and analysis of the qspinlock considerably more challenging. I would /really/ like to see an update to the TLA+ model we have of the current implementation [1] and preferably also the userspace version I hacked together [2] so that we can continue to test and validate changes to the code outside of the usual kernel stress-testing. Will [1] https://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/kernel-tla.git/ [2] https://mirrors.edge.kernel.org/pub/linux/kernel/people/will/spinbench/