Received: by 2002:a89:d88:0:b0:1fa:5c73:8e2d with SMTP id eb8csp150055lqb; Thu, 23 May 2024 13:39:08 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCWbSJtEBY9Yz3zPMfuxWOlYoQIjaHZ69nIoYoFuIY/cwSo5zAsVfbDn1z/yIwEN+jLa4SKr0g7HzGZW8HXJLbuXkrdMOu+YWheipYrZrA== X-Google-Smtp-Source: AGHT+IGX0ya0MbFHVQ86PfWe1B2m/Uq1l2EDPeP1qf5T+GeU7xaaBR0zbkx4ctOA1AJqlUQ8yp8z X-Received: by 2002:a05:6a20:438c:b0:1a7:4f8b:6439 with SMTP id adf61e73a8af0-1b212d39559mr667720637.34.1716496748318; Thu, 23 May 2024 13:39:08 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1716496748; cv=pass; d=google.com; s=arc-20160816; b=gA9g0xCqaiOaUyoPfxpwM89X9mxZ7SzetLHsniesEGC8F9TV5NZm1p10YatONzjKdN alNfHFDvVmuxZ01JI+fliHF3WcleTxItE0QXmRtEgsoEjWEWUhxVussFei0la6GBWktm ErTZfPhSSQXcbM1c6P6jyyWgVk5Rws6SsggGxXE2DmFj/gggG3/yk1FOTVkpnLHf+kPW 3m/m2QzkAEbl50P6BReuXtfC8BNRczYBNbk+3tbD6hchermT4h+A2J6hj26onk/ZALfQ Y3SGMhPnxXttkPwaH6mkEubZ3A11ELvv9/qXQYYROKDIK28XLgq/8BQLhcFLBwGEdcwL /ptg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:in-reply-to:content-language:from :references:cc:to:subject:user-agent:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:date:message-id:dkim-signature; bh=kWXZ2EInAANxHHbHuP/oGS3m94Spv6SMLiggVfajbCM=; fh=bOxUGhTGL87/ujRniAre4D63IuNsLl3Sehv+0mTxaSw=; b=AUrQJaOho3SLXrYQZPZl48iLciS9xjQnGmTwCpI425VuaSaa9RTi7fI8WUSeHV3+wd bWtGACoKEL9TCFjNWgCDsbft55XhxgIgEt5HdWpzpQE3eQaV4hM35MqCJv4vSBejdFwt YAnNeb+Bqq8fI0R+JDWrVZ0R93OaPyeBzSVn+BzcKM4dDk7ha2saZo2RpwIGCuU3Zxwu EJ4Lclx1x43/WcMVldU8DB5bm4ytbJnafSXAFU2mnMxf0GWmfkuJmVC/DvtOFrNlsEl5 rLe7/y++U+KB2FPV1QkcxWC16v8e1vvHBA1I85NHd08qosD+99ZQH4qjkJBLCqtSG4M6 /xrg==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@efficios.com header.s=smtpout1 header.b=t+wTssY6; arc=pass (i=1 spf=pass spfdomain=efficios.com dkim=pass dkdomain=efficios.com dmarc=pass fromdomain=efficios.com); spf=pass (google.com: domain of linux-kernel+bounces-187979-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-187979-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=efficios.com Return-Path: Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [139.178.88.99]) by mx.google.com with ESMTPS id d2e1a72fcca58-6f8fc36e59fsi58573b3a.118.2024.05.23.13.39.08 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 23 May 2024 13:39:08 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-187979-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) client-ip=139.178.88.99; Authentication-Results: mx.google.com; dkim=pass header.i=@efficios.com header.s=smtpout1 header.b=t+wTssY6; arc=pass (i=1 spf=pass spfdomain=efficios.com dkim=pass dkdomain=efficios.com dmarc=pass fromdomain=efficios.com); spf=pass (google.com: domain of linux-kernel+bounces-187979-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-187979-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=efficios.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id E9CD1284449 for ; Thu, 23 May 2024 20:39:07 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 7275E84E05; Thu, 23 May 2024 20:39:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=efficios.com header.i=@efficios.com header.b="t+wTssY6" Received: from smtpout.efficios.com (smtpout.efficios.com [167.114.26.122]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A0D7D84DF4; Thu, 23 May 2024 20:38:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=167.114.26.122 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716496740; cv=none; b=FoBdQhMxCxG/QcjSzCusJa3mLXeLcjqvz3R9/4BITXUU4wgYAytTjJL/z2yh7XtKMsQbwUHYtj4kjEaOuci/8kIxtKAjSLhj2gwM6EpRbiMdwnEQHc7Uh7fEniMiQP2+ziUsDMuA/ExjFS2XhM/1VGPcHf4r9rcy0fCMBChdYd8= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716496740; c=relaxed/simple; bh=boIZfxh8E99Yr4LDyD87A58BERRdGEHApGNbwW5KdbM=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=AQfyAkEli2j3c+yPc5P1BC924+Rbma0BVymANPApzLH0gKEoLl2MqUDuQAjSWIYjIGufgsV00Q5IEiZI0/HDp4ZEMIfNxsCPChjW4RoECieufiexZ3tjOvmvzQChjZhB5RS9LsPnXWsfjxUXHLESTpENxzbMun9vYjLd+q8yHX4= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=efficios.com; spf=pass smtp.mailfrom=efficios.com; dkim=pass (2048-bit key) header.d=efficios.com header.i=@efficios.com header.b=t+wTssY6; arc=none smtp.client-ip=167.114.26.122 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=efficios.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=efficios.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=efficios.com; s=smtpout1; t=1716496737; bh=boIZfxh8E99Yr4LDyD87A58BERRdGEHApGNbwW5KdbM=; h=Date:Subject:To:Cc:References:From:In-Reply-To:From; b=t+wTssY60KYJxr7EopwBqhHV9gb1dMIB7fYQsRiHjKY/pkBPfND/jp5llYbIEYlXg AO7YnFSUZNN60xsUgjQXmJc5sfDPYBnBQ/eXhxJFlYPoU8+jVK58JZRQmHpZHahU0f 1puIA6BhDvmc/mNbJCo8EJQsrXWqk3Ut9ktsQy6kWjK2MpPsb8njYfNjO9kmLJieJf xJgYQZhUxWsKzcJSbYS28/97MsBgDQYGly1+aACxVURgpbeRmYbbIT2XZsp0h+LuD3 O5XkGA0GnSq1jg++pO9CND771uwcUGLBfqouhLL4zZPHPM345lcAPZkzMKVYRLrBFo VHgQPK8LF+TxA== Received: from [172.16.0.134] (192-222-143-198.qc.cable.ebox.net [192.222.143.198]) by smtpout.efficios.com (Postfix) with ESMTPSA id 4Vlg4d2qQVz10dh; Thu, 23 May 2024 16:38:57 -0400 (EDT) Message-ID: <156dc43f-fdcf-4643-83d9-b374452b0929@efficios.com> Date: Thu, 23 May 2024 16:39:33 -0400 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v2 0/1] Add FUTEX_SPIN operation To: =?UTF-8?Q?Andr=C3=A9_Almeida?= , Peter Zijlstra Cc: linux-kernel@vger.kernel.org, Thomas Gleixner , "Paul E . McKenney" , Boqun Feng , "H . Peter Anvin" , Paul Turner , linux-api@vger.kernel.org, Christian Brauner , Florian Weimer , David.Laight@ACULAB.COM, carlos@redhat.com, Peter Oskolkov , Alexander Mikhalitsyn , Chris Kennelly , Ingo Molnar , Darren Hart , Davidlohr Bueso , libc-alpha@sourceware.org, Steven Rostedt , Jonathan Corbet , Noah Goldstein , longman@redhat.com, kernel-dev@igalia.com References: <20240523200704.281514-1-andrealmeid@igalia.com> From: Mathieu Desnoyers Content-Language: en-US In-Reply-To: <20240523200704.281514-1-andrealmeid@igalia.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit On 2024-05-23 16:07, André Almeida wrote: > Hi, > > In the last LPC, Mathieu Desnoyers and I presented[0] a proposal to extend the > rseq interface to be able to implement spin locks in userspace correctly. Thomas > Gleixner agreed that this is something that Linux could improve, but asked for > an alternative proposal first: a futex operation that allows to spin a user > lock inside the kernel. This patchset implements a prototype of this idea for > further discussion. > > With FUTEX2_SPIN flag set during a futex_wait(), the futex value is expected to > be the TID of the lock owner. Then, the kernel gets the task_struct of the > corresponding TID, and checks if it's running. It spins until the futex > is awaken, the task is scheduled out or if a timeout happens. If the lock owner > is scheduled out at any time, then the syscall follows the normal path of > sleeping as usual. The user input is masked with FUTEX_TID_MASK so we have some > bits to play. > > If the futex is awaken and we are spinning, we can return to userspace quickly, > avoid the scheduling out and in again to wake from a futex_wait(), thus > speeding up the wait operation. The user input is masked with FUTEX_TID_MASK so > we have some bits to play. > > Christian Brauner suggested using pidfd to avoid race conditions, and I will > implement that in the next patch iteration. I benchmarked the implementation > measuring the time required to wait for a futex for a simple loop using the code > at [2]. In my setup, the total wait time for 1000 futexes using the spin method > was almost 10% lower than just using the normal futex wait: > > Testing with FUTEX2_SPIN | FUTEX_WAIT > Total wait time: 8650089 usecs > > Testing with FUTEX_WAIT > Total wait time: 9447291 usecs > > However, as I played with how long the lock owner would be busy, the > benchmark results of spinning vs no spinning would match, showing that the > spinning will be effective for some specific scheduling scenarios, but depending > on the wait time, there's no big difference either spinning or not. > > [0] https://lpc.events/event/17/contributions/1481/ > > You can find a small snippet to play with this interface here: > > [1] https://gist.github.com/andrealmeid/f0b8c93a3c7a5c50458247c47f7078e1 What exactly are you trying to benchmark here ? I've looked at this toy program, and I suspect that most of the delay you observe is due to initial scheduling of a newly cloned thread, because this is what is repeatedly being done in the delay you measure. I would recommend to change this benchmark program to measure something meaningful, e.g.: - N threads repeatedly contending on a lock, until a "stop" flag is set, - run for "duration" seconds, after which main() sets a "stop" flag. - delay loop of "work_delay" us within the lock critical section, - delay loop of "inactive_delay" us between locking attempts, - measure the time it takes to grab the lock, report stats on this, - measure the total number of operations done within the given "duration". - report statistics on the number of operations per thread to see the impact on fairness, The run the program with the following constraints: - Pin one thread per core, with nb thread <= nb cores. This should be a best case scenario for spinning. - Pin all threads to a single core. when nb threads > nb cores, this should be the worse scenario for spinning. - Groups things between those two extremes to see how things evolve. I would not be surprised that if you measure relevant delays, you will observe much better speedups than what you currently have. Thanks, Mathieu > > Changelog: > > v1: - s/PID/TID > - masked user input with FUTEX_TID_MASK > - add benchmark tool to the cover letter > - dropped debug prints > - added missing put_task_struct() > > André Almeida (1): > futex: Add FUTEX_SPIN operation > > include/uapi/linux/futex.h | 2 +- > kernel/futex/futex.h | 6 ++- > kernel/futex/waitwake.c | 78 +++++++++++++++++++++++++++++++++++++- > 3 files changed, 82 insertions(+), 4 deletions(-) > -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com