Received: by 2002:ab2:60d1:0:b0:1f7:5705:b850 with SMTP id i17csp991613lqm; Thu, 2 May 2024 01:47:26 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCWxrGNfqz82nfh5IKYB4ILS58+6BAxFqLTI5GrHdNg02UYDQGq6M7gQkTxj4R/I35X4IXun5MCUPsAL9SvwE9GO5EyuDGWTwGmpFl2LPw== X-Google-Smtp-Source: AGHT+IHxyAxBugoLd8XqLoz/EvLIHafPtDnmfLT7RIVkEegXU4Xt9VcLWsRsnaT6hcaUR275qCUT X-Received: by 2002:a50:cd17:0:b0:572:795a:b913 with SMTP id z23-20020a50cd17000000b00572795ab913mr3659700edi.34.1714639646402; Thu, 02 May 2024 01:47:26 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1714639646; cv=pass; d=google.com; s=arc-20160816; b=cB9pKbc/WGiWyjRouMbzCsWZccfveGvb09wa/9ppMD1YUjWLuC9YErIRVFeea8lLQc Evw2G4oevYhs9sSpp/2xrDv7dowln44AU+WBTg9QDUE23FTVMYMNpiZYPKEJ0nvOChSh wVOdoC9CTsazBRGPWh58lNjZvuAiwd36M5pSBKf3Rjnz3Kt08Qre0RCupHBK1NcXcIGJ v78NlSgFlf/BBf0dTsrGRktZuesEteoWAN3xAPQKG56NZdZkGLTV8wEhe/NroW4pjU0R j+94z+feNOPVj7zyTF+57scpUNWwKvTCdLsB2049O/yk6iiu6JIRHxjHBTbR3Ff15yOM vmeA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:list-unsubscribe:list-subscribe:list-id:precedence :references:message-id:subject:cc:to:from:date:dkim-signature; bh=GzXeYytSBSmSaG4pBD3AoJl4PJfZ86tBgJ3EqeFhaoU=; fh=dedPv1sUKjdjuapwAV3cFAr3DRCHZ8R8WEOR7L1Fk4Q=; b=QmBWv5uLwWuAcy6vdJusgNFlR2sfACBV18B805jflm281pzq5O+5VJKGJRM24u6HIl CaUy7wjJWZFLAwnFJbL5++MCoxgI+GHjKAsq5DTqfcAqiiLrNSfS+9pmH10vBvcW+uEe Q8An8mT0/gbcu1PdIJHMig7ocCA1i+DZGCxF5vcNUkFK/Mwmb3/VN4NHqJBZpAOC/1Vr Jr+iRCX6F3nKtbwEZNgVztyy7iVRaGuLWvu04ci6L6LTXtk5ANAkUEl4VvOotpfYcZ9q d3Xd9lT/HK8Rm+Y8iqcVww3eSjGECBLQsf02GHMWvz8wxBsuaQMlJpxcEEd1AZFFW82V oExQ==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b="GXE/LURx"; arc=pass (i=1 dkim=pass dkdomain=kernel.org); spf=pass (google.com: domain of linux-kernel+bounces-166138-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-166138-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [2604:1380:4601:e00::3]) by mx.google.com with ESMTPS id m18-20020a509992000000b00572a170f5a0si295743edb.11.2024.05.02.01.47.26 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 02 May 2024 01:47:26 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-166138-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) client-ip=2604:1380:4601:e00::3; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b="GXE/LURx"; arc=pass (i=1 dkim=pass dkdomain=kernel.org); spf=pass (google.com: domain of linux-kernel+bounces-166138-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-166138-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id 094C31F23DFF for ; Thu, 2 May 2024 08:47:11 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 1A44D524D4; Thu, 2 May 2024 08:45:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="GXE/LURx" Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3F4D350A88; Thu, 2 May 2024 08:45:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714639551; cv=none; b=RdOn0q5zJwiMktRR2MrD8sATgZO5UPxuK/Kj8ILZuCy+DkjYalOtdRtp7PvfhEkPQxTIE5XVuEP3s6s027TedNs3U2lkuOOdkX7xHflz6sp68NHnGUW/+FDdCJxBGdEIZflozUtFAk0JBwLwFRLGlLiLoCblooUyvZeW5D0L468= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714639551; c=relaxed/simple; bh=oqBfWlYmUrwyyjapwJqNsqrTwEaOMkpHH7sOf45lMOA=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=oNN+wbGNUnVNpovD8lKBgoI4pCCjAdqDznLEZulAlMijOK71MDGswIyayLEmGmbxyB6K2r2lpR3hI+Wmiivpzt9cHrjXnvMWQUkhNLI1WZT/+QYfxQaX5IurhZOCVWKE2Ue0bVqPskkJWTmN4To3N//4BHkmB6EhBw0pI5LOr+I= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=GXE/LURx; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id B0F8CC113CC; Thu, 2 May 2024 08:45:44 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1714639550; bh=oqBfWlYmUrwyyjapwJqNsqrTwEaOMkpHH7sOf45lMOA=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=GXE/LURx0zaeD0T28AK/+RCTb/3lsoHyOSECYs4mzTcXWSTXTfcNW13HTcjoFu2sb 0/zdv/9zKCz7YuBbh3exdsoynAkG+t5c8DAJSqEIMtWqVv256foGUmvZ0fnsQFvK5V /t/o0qEeNAtnsuPRyYpJeiEdDz4dVxdw57nHY7Da6MfJRfNkCazlacPeAZyf1HP02U xbgeLj8lAD59StL68W+Ue8EjnSiTZMijwZjihBSfXYFZJv4F0Lmq8wDmuKIyNzbh6F LNJ3SL1ept4+Kv9Xd5m8E/PRVB35v/RoBTsTvjvchqiMAJQiko27irRUFZowLLs7M1 1cH1+7cTCq7sQ== Date: Thu, 2 May 2024 10:45:41 +0200 From: Christian Brauner To: =?utf-8?B?QW5kcsOp?= Almeida Cc: Mathieu Desnoyers , Peter Zijlstra , Thomas Gleixner , linux-kernel@vger.kernel.org, "Paul E . McKenney" , Boqun Feng , "H . Peter Anvin" , Paul Turner , linux-api@vger.kernel.org, Florian Weimer , David.Laight@aculab.com, carlos@redhat.com, Peter Oskolkov , Alexander Mikhalitsyn , Chris Kennelly , Ingo Molnar , Darren Hart , Davidlohr Bueso , libc-alpha@sourceware.org, Steven Rostedt , Jonathan Corbet , Noah Goldstein , Daniel Colascione , longman@redhat.com, kernel-dev@igalia.com Subject: Re: [RFC PATCH 0/1] Add FUTEX_SPIN operation Message-ID: <20240502-gezeichnet-besonderen-d277879cd669@brauner> References: <20240425204332.221162-1-andrealmeid@igalia.com> <20240426-gaumen-zweibeinig-3490b06e86c2@brauner> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: On Wed, May 01, 2024 at 08:44:36PM -0300, André Almeida wrote: > Hi Christian, > > Em 26/04/2024 07:26, Christian Brauner escreveu: > > On Thu, Apr 25, 2024 at 05:43:31PM -0300, André Almeida wrote: > > > Hi, > > > > > > In the last LPC, Mathieu Desnoyers and I presented[0] a proposal to extend the > > > rseq interface to be able to implement spin locks in userspace correctly. Thomas > > > Gleixner agreed that this is something that Linux could improve, but asked for > > > an alternative proposal first: a futex operation that allows to spin a user > > > lock inside the kernel. This patchset implements a prototype of this idea for > > > further discussion. > > > > > > With FUTEX2_SPIN flag set during a futex_wait(), the futex value is expected to > > > be the PID of the lock owner. Then, the kernel gets the task_struct of the > > > corresponding PID, and checks if it's running. It spins until the futex > > > is awaken, the task is scheduled out or if a timeout happens. If the lock owner > > > is scheduled out at any time, then the syscall follows the normal path of > > > sleeping as usual. > > > > > > If the futex is awaken and we are spinning, we can return to userspace quickly, > > > avoid the scheduling out and in again to wake from a futex_wait(), thus > > > speeding up the wait operation. > > > > > > I didn't manage to find a good mechanism to prevent race conditions between > > > setting *futex = PID in userspace and doing find_get_task_by_vpid(PID) in kernel > > > space, giving that there's enough room for the original PID owner exit and such > > > PID to be relocated to another unrelated task in the system. I didn't performed > > > > One option would be to also allow pidfds. Starting with v6.9 they can be > > used to reference individual threads. > > > > So for the really fast case where you have multiple threads and you > > somehow may really do care about the impact of the atomic_long_inc() on > > pidfd_file->f_count during fdget() (for the single-threaded case the > > increment is elided), callers can pass the TID. But in cases where the > > inc and put aren't a performance sensitive, you can use pidfds. > > > > Thank you very much for making the effort here, much appreciated :) > > While I agree that pidfds would fix the PID race conditions, I will move > this interface to support TIDs instead, as noted by Florian and Peter. With > TID the race conditions are diminished I reckon? Unless I'm missing something the question here is PID (as in TGID aka thread-group leader id gotten via getpid()) vs TID (thread specific id gotten via gettid()). You want the thread-specific id as you want to interact with the futex state of a specific thread not the thread-group leader. Aside from that TIDs are subject to the same race conditions that PIDs are. They are allocated from the same pool (see alloc_pid()).