Received: by 2002:a89:d88:0:b0:1fa:5c73:8e2d with SMTP id eb8csp141256lqb; Thu, 23 May 2024 13:20:04 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCUJLRxGTUMCOs4aDLbs9Khru9MeSOORxofpzBKY3RUbroasSc7f4jZiJDSiExCuPHA3RecOnu0qcza0sKh2DgiY/h4GmWchMIE8rQK7JA== X-Google-Smtp-Source: AGHT+IFvKyiDb6m4zjPKZGibzae74+Qz4SIslybEFKYqqENsTTB+86atavA2COAtNBRCLsqc8rAx X-Received: by 2002:a17:902:f20b:b0:1eb:acff:63bf with SMTP id d9443c01a7336-1f448d314c7mr3629325ad.37.1716495603797; Thu, 23 May 2024 13:20:03 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1716495603; cv=pass; d=google.com; s=arc-20160816; b=SwROyn2mZBce30D8KOdDorDJgWp/PM6ozFlLcv0TnIFlUu63qlbGxsUkHFic3NMbBs e9k+eIzGS6Fp3Zsae6J1utqq07K8Cv70yahzO9Mv0JgH4l4knTESIXZbJQ5J389j26S9 LFJrcly2qZrTP1gznRUmfWW9pIsvkGCbaOPuYyIE2nlI0OYYB9QSVSI95wnsmjUDlEdv 7fBZrkqAT/b7CmrjsBLL89qx6E4n/YaIXqmPEWeKXAy/vZl0mkDAbZuSFT0+MuxyC4yN 2DNKRhmZcVIdNWYizpdDpUNBiVOdifo/yjfkaL741KvAEGgOCllWinWJuldn4qf+5CkJ 73Gg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:in-reply-to:content-language:from :references:cc:to:subject:user-agent:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:date:message-id:dkim-signature; bh=iPT/X9yM/BhCrbb+zmmRdHV+0xUe6jv9uuPlc/K1eSU=; fh=RcMgX8SbGUkOyk2363dk97uOo9U4f6b7vmzTNGDoaBg=; b=XVOgn15Ycruryhk06DAAYELLh7oPYTCC/f03par+iWeLv6PJNyCUzXW5JJy0EDGjBX gg6BxTnHwBEL7JeRajYcyhf4IKFYKg2knY+YIQBXC7BdJ3qDAJyYARe7T4sk/7yBU04k 0kBKiT3okxETEgWPLHysiDg+bg1D2a5vVDBax+MUVr3sELy4OkbL1Kq5AJb5M2TEcbuS 1P+7Cgf6En8L8g8y6ZBFPYEwax8lsdSOop+PK5oMLm+F6a5hgylIKhHSeZvf1n8gXbNj 1dfsioqogm+Vmd6M2u6lFJirgIKcJp8J+VnHE5P8a6DpqK0lA8b2YVyQAsELV3G109tE Gyng==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@efficios.com header.s=smtpout1 header.b=umfVpYwE; arc=pass (i=1 spf=pass spfdomain=efficios.com dkim=pass dkdomain=efficios.com dmarc=pass fromdomain=efficios.com); spf=pass (google.com: domain of linux-kernel+bounces-187964-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-187964-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=efficios.com Return-Path: Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [139.178.88.99]) by mx.google.com with ESMTPS id d9443c01a7336-1f2f2215d3fsi98132545ad.151.2024.05.23.13.20.03 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 23 May 2024 13:20:03 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-187964-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) client-ip=139.178.88.99; Authentication-Results: mx.google.com; dkim=pass header.i=@efficios.com header.s=smtpout1 header.b=umfVpYwE; arc=pass (i=1 spf=pass spfdomain=efficios.com dkim=pass dkdomain=efficios.com dmarc=pass fromdomain=efficios.com); spf=pass (google.com: domain of linux-kernel+bounces-187964-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-187964-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=efficios.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id 71D18284073 for ; Thu, 23 May 2024 20:20:03 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 85AA082D91; Thu, 23 May 2024 20:19:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=efficios.com header.i=@efficios.com header.b="umfVpYwE" Received: from smtpout.efficios.com (smtpout.efficios.com [167.114.26.122]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E4F5884D26; Thu, 23 May 2024 20:19:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=167.114.26.122 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716495585; cv=none; b=twb7SWH2HkU4ge63Vm77mvCCHxXrCHEBmk7Tw8Ts+Sbf9nEzNypkzGGEB2NnatzmbDPf+gQPzU6Mr1ym+VB08UcMnXFMZcRa0zNhNkwSiDh9u6hJtJ6hsg9s8HLcFUOcjszAEOPp5httliSfGVTe+LDhf/W+XfHREReKG43RO9w= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716495585; c=relaxed/simple; bh=DmtE3NmDfeuWXwvfKTneK6k7HMV4HXj51KzM63yhWco=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=pDQJW9dIaVCohDCE4VS7g/rWoeV+2shO7IeLMwAwLmYu6IWLo/qlSJpzhj88DyEVhimN/+pwIfJ41Bd0/GQVntciqUdtlFa64uU+qYFhZ5dws2Wjr6MuGViEQ1mXc9Gs4SI4zDxLND05iYq2gjE8a1WO6v0BhHuuogG5ZeoIy0U= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=efficios.com; spf=pass smtp.mailfrom=efficios.com; dkim=pass (2048-bit key) header.d=efficios.com header.i=@efficios.com header.b=umfVpYwE; arc=none smtp.client-ip=167.114.26.122 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=efficios.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=efficios.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=efficios.com; s=smtpout1; t=1716495581; bh=DmtE3NmDfeuWXwvfKTneK6k7HMV4HXj51KzM63yhWco=; h=Date:Subject:To:Cc:References:From:In-Reply-To:From; b=umfVpYwE4XiDyWL8dEt2yo1JMis6pngumarv4ZxNgvohxA0ncOapHILTr2ADJnA1Y II052msbqiMz3jrhE2WiNsx1h5trLVtbd3trxAOw3ZRY1B3gnJil5L1ICOl/XNe9TT pYHed4mPQ9GRRJPYgxC0alZ9SaYmF9MG/t8fktfaYH9HryxK9jLH53rz9mURFwJhiD VE9KZQS8BmJQmCTzt5m9u4UYnFEKmiKZji9HWszPg3fTfl4mfsbSy8OHyHnfdVt5Ky CDIQteUZudFjkx7iTmfJ4VnAS6hil0pnyvEfqaXHsT1EQbIoThIiwQedzGgepbuTSo tV7xpgx0LphGQ== Received: from [172.16.0.134] (192-222-143-198.qc.cable.ebox.net [192.222.143.198]) by smtpout.efficios.com (Postfix) with ESMTPSA id 4VlffP05fSz116k; Thu, 23 May 2024 16:19:40 -0400 (EDT) Message-ID: <62825712-36bc-483c-9bca-db3d9233b0d2@efficios.com> Date: Thu, 23 May 2024 16:20:16 -0400 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v2 1/1] futex: Add FUTEX_SPIN operation To: =?UTF-8?Q?Andr=C3=A9_Almeida?= , Peter Zijlstra Cc: linux-kernel@vger.kernel.org, Thomas Gleixner , "Paul E . McKenney" , Boqun Feng , "H . Peter Anvin" , Paul Turner , linux-api@vger.kernel.org, Christian Brauner , Florian Weimer , David.Laight@ACULAB.COM, carlos@redhat.com, Peter Oskolkov , Alexander Mikhalitsyn , Chris Kennelly , Ingo Molnar , Darren Hart , Davidlohr Bueso , libc-alpha@sourceware.org, Steven Rostedt , Jonathan Corbet , Noah Goldstein , Daniel Colascione , longman@redhat.com, kernel-dev@igalia.com References: <20240523200704.281514-1-andrealmeid@igalia.com> <20240523200704.281514-2-andrealmeid@igalia.com> From: Mathieu Desnoyers Content-Language: en-US In-Reply-To: <20240523200704.281514-2-andrealmeid@igalia.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit On 2024-05-23 16:07, André Almeida wrote: > Add a new mode for futex wait, the futex spin. > > Given the FUTEX2_SPIN flag, parse the futex value as the TID of the lock > owner. Then, before going to the normal wait path, spins while the lock > owner is running in a different CPU, to avoid the whole context switch > operation and to quickly return to userspace. If the lock owner is not > running, just sleep as the normal futex wait path. > > The user value is masked with FUTEX_TID_MASK, to allow some bits for > future use. > > The check for the owner to be running or not is important to avoid > spinning for something that won't be released quickly. Userspace is > responsible on providing the proper TID, the kernel does a basic check. > > Signed-off-by: André Almeida > --- [...] > + > +static int futex_spin(struct futex_hash_bucket *hb, struct futex_q *q, > + struct hrtimer_sleeper *timeout, u32 uval) > +{ > + struct task_struct *p; > + pid_t pid = uval & FUTEX_TID_MASK; > + > + p = find_get_task_by_vpid(pid); > + > + /* no task found, maybe it already exited */ > + if (!p) { > + futex_q_unlock(hb); > + return -EAGAIN; > + } > + > + /* can't spin in a kernel task */ > + if (unlikely(p->flags & PF_KTHREAD)) { > + put_task_struct(p); > + futex_q_unlock(hb); > + return -EPERM; > + } > + > + futex_queue(q, hb); > + > + if (timeout) > + hrtimer_sleeper_start_expires(timeout, HRTIMER_MODE_ABS); > + > + while (1) { Infinite loops in other kernel/futex/ files appear to use "for (;;) {" instead. > + if (likely(!plist_node_empty(&q->list))) { > + if (timeout && !timeout->task) > + goto exit; > + > + if (task_on_cpu(p)) { > + /* spin */ You may want to add a "cpu_relax();" here to lessen the power consumption of this busy-loop. > + continue; > + } else { > + /* task is not running, sleep */ > + break; > + } > + } else { > + goto exit; > + } > + } > + > + /* spinning didn't work, go to the normal path */ > + set_current_state(TASK_INTERRUPTIBLE|TASK_FREEZABLE); I wonder if flipping the order between: set_current_state(TASK_INTERRUPTIBLE|TASK_FREEZABLE); futex_queue(q, hb); as done in futex_wait_queue() and what is done here matters ? Does it introduce a race where a wakeup could be missed ? If it's an issue, then setting the current state could be done before invoking futex_queue(), and whenever the spin exits, set it back to TASK_RUNNING. > + > + if (likely(!plist_node_empty(&q->list))) { > + if (!timeout || timeout->task) > + schedule(); > + } > + > + __set_current_state(TASK_RUNNING); > + > +exit: > + put_task_struct(p); > + return 0; > +} > + > /** > * futex_unqueue_multiple - Remove various futexes from their hash bucket > * @v: The list of futexes to unqueue > @@ -665,8 +732,15 @@ int __futex_wait(u32 __user *uaddr, unsigned int flags, u32 val, > if (ret) > return ret; > > - /* futex_queue and wait for wakeup, timeout, or a signal. */ > - futex_wait_queue(hb, &q, to); > + if (flags & FLAGS_SPIN) { > + ret = futex_spin(hb, &q, to, val); The empty line below could be removed. Thanks, Mathieu > + > + if (ret) > + return ret; > + } else { > + /* futex_queue and wait for wakeup, timeout, or a signal. */ > + futex_wait_queue(hb, &q, to); > + } > > /* If we were woken (and unqueued), we succeeded, whatever. */ > if (!futex_unqueue(&q)) -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com