Received: by 2002:a89:48b:0:b0:1f5:f2ab:c469 with SMTP id a11csp1377550lqd; Thu, 25 Apr 2024 13:45:05 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCUAPxqGAXVpqhfUskJcUchRHZpePuAbDx8lIoVn9oZNey7xXui9Hx+sCy+6oRdNdJNzvGTrLOT/597gkM90LmT3LTrh5bX1+sIiSYdjVw== X-Google-Smtp-Source: AGHT+IG+tGJwoJ97J2f/+aby591o/TRMwbcc+xbX8onnDb8bUxlNX0TS6Rg3ScN1z3MRsp4P9Zzi X-Received: by 2002:a5d:6988:0:b0:33e:dbc0:773 with SMTP id g8-20020a5d6988000000b0033edbc00773mr441030wru.44.1714077904788; Thu, 25 Apr 2024 13:45:04 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1714077904; cv=pass; d=google.com; s=arc-20160816; b=RMYs1zT7c3cL1C2YcAleTsgIzrBQm7NyA/ekJSkbXnVR7xsAYDkyLgId6ZG1Yta8E6 4I1LxL3Jt+KDgzWXaBNOOeaY3FWPRV24WfQaLfBpuLysN6RpNKKuYcyY2871NptysWVT kf+I3/MU+wt0pN1PdESDWbTOIA9+MW9gMrYSCDW7fb6E8hBmKS9Q0lfRrji+aN1DWfxk /SnwHBjx0U3A0W+vOgdU7OLWMLUxi+iepIvSs7qk6RQZX7QuPc4jXLOmII8sOa5O+2S4 2sVh3wHi1bSVXDW5/Ag6MRKAJfE8Y2b136EpZp6aWKdeTS4r/EpcjsZm43NTsPuUdjHf 9L8A== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature; bh=oHann4TH2Ah2IPapN8sqWvlYSxh8ekOAJ6wfcLQixXk=; fh=L7KtKW2SL8sVvMcmH85IZXU5GmU4eGsKIlvm/evskWk=; b=zeb6oARRZIp8b7bU/iU4ts5UtqeCvHg7DccD7VTkI3/OtQkF0L3h12F+CFD3RE2dVO icxc9IcJq91oupQfOfINDp/YdInai9Sc2LP6FxviOdq/pSjHevdK42/CGYsWUDleq7Kl 2CCuMZR8EddaGhkNu+wlsCaGhFYQQBINCIgcVfk3bbylCmy4GffkkXiAU0lUXc/bsW92 c2HkgfPacYc0LnI/FJGD73rU/hi61SjKHSvnMNvsnA/2vxqZkybfhW8B7UhEHykIK/4N +/U19AvCV/NoDEqI/Oxkd/3nJ0vLNDjUdCT36m0WyAs3JbMJ5pSd/LRfbCaeKpgsXWgu LppA==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=fail header.i=@igalia.com header.s=20170329 header.b=jkyRxfd3; arc=pass (i=1 spf=pass spfdomain=igalia.com dkim=pass dkdomain=igalia.com); spf=pass (google.com: domain of linux-kernel+bounces-159157-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-159157-linux.lists.archive=gmail.com@vger.kernel.org" Return-Path: Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [147.75.80.249]) by mx.google.com with ESMTPS id x22-20020a1709064a9600b00a522e519d5dsi10197210eju.395.2024.04.25.13.45.04 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 25 Apr 2024 13:45:04 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-159157-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) client-ip=147.75.80.249; Authentication-Results: mx.google.com; dkim=fail header.i=@igalia.com header.s=20170329 header.b=jkyRxfd3; arc=pass (i=1 spf=pass spfdomain=igalia.com dkim=pass dkdomain=igalia.com); spf=pass (google.com: domain of linux-kernel+bounces-159157-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-159157-linux.lists.archive=gmail.com@vger.kernel.org" Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id 5E14B1F24428 for ; Thu, 25 Apr 2024 20:45:04 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 67CF71552F9; Thu, 25 Apr 2024 20:44:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=igalia.com header.i=@igalia.com header.b="jkyRxfd3" Received: from fanzine2.igalia.com (fanzine.igalia.com [178.60.130.6]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 07E47153814; Thu, 25 Apr 2024 20:44:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=178.60.130.6 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714077860; cv=none; b=HCr8C3jpMqH3qkbqvZi5QbFcoY/gyWUjifC3cxLCX/zLKxjJCS29zmuCtURqTwheQIcKLbDR56tbS96j3xjHMbZ8+AFbymL/8OadLT6prMJo8Z88QNeJpSo5MXIVnz/gDsnPZp5IE9A0MQ9H0Mm++12Dh3aIKIVVE4lFQ6pJSAk= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714077860; c=relaxed/simple; bh=8sxkJ4DWuOJCZutr1xA6A89cSCURGEeXGMrk/zUulQw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=j9IIu4r0diYwVt+TJDOd7SsOYwQG+kvLa0kkM45vmYvWdggOIqJtNI2JxkerHzpWOQFwfKj3FZUKFwpiD4tOXzCFm0vuSAjp3V2HFfp66HmhCOtX+jpOtzeduzyZVsdgAgvbBV6WX3VWtvIvZTzUGdMC3Aggo4FkTtse/hriAZ0= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=igalia.com; spf=pass smtp.mailfrom=igalia.com; dkim=pass (2048-bit key) header.d=igalia.com header.i=@igalia.com header.b=jkyRxfd3; arc=none smtp.client-ip=178.60.130.6 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=igalia.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=igalia.com DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=igalia.com; s=20170329; h=Content-Transfer-Encoding:Content-Type:MIME-Version:References: In-Reply-To:Message-ID:Date:Subject:Cc:To:From:Sender:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=oHann4TH2Ah2IPapN8sqWvlYSxh8ekOAJ6wfcLQixXk=; b=jkyRxfd3GcBbNr6BqHzzx6A5Jo 5z4OjIrxefhD6R+pJufuaS8wwJQFKeYzajwbQteZqnubgHo7Iluz+SJdp1unxCtq0nOwcPuGe38TM S5F59RMbkqSzwbeQJNSL9GoXuDx3kaHOQGyNQvgtdt9Es4QxZQqbIxdEMdgbBWqyRmQrNQWHptUJT gJ8ji4/bm2GmNbsbWPzFs7EMChXhk6od8B1Y5170ZgRcfAV+tBPpg17FC73qt84+F4nqgNkEBMJu8 bNyLehdvELH/kHa/hye3FmD20+Ho69kLs8J+5IJ7+q64O8MhbHE5Rzvyg6W7rOE5XyzyQkAXyLylq qcg26LPg==; Received: from 201-42-129-95.dsl.telesp.net.br ([201.42.129.95] helo=steammachine.lan) by fanzine2.igalia.com with esmtpsa (Cipher TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim) id 1s05wd-008iV5-Tt; Thu, 25 Apr 2024 22:43:52 +0200 From: =?UTF-8?q?Andr=C3=A9=20Almeida?= To: Mathieu Desnoyers , Peter Zijlstra , Thomas Gleixner Cc: linux-kernel@vger.kernel.org, "Paul E . McKenney" , "Boqun Feng" , "H . Peter Anvin" , "Paul Turner" , linux-api@vger.kernel.org, "Christian Brauner" , "Florian Weimer" , David.Laight@ACULAB.COM, carlos@redhat.com, "Peter Oskolkov" , "Alexander Mikhalitsyn" , "Chris Kennelly" , "Ingo Molnar" , "Darren Hart" , "Davidlohr Bueso" , =?UTF-8?q?Andr=C3=A9=20Almeida?= , libc-alpha@sourceware.org, "Steven Rostedt" , "Jonathan Corbet" , "Noah Goldstein" , "Daniel Colascione" , longman@redhat.com, kernel-dev@igalia.com Subject: [RFC PATCH 1/1] futex: Add FUTEX_SPIN operation Date: Thu, 25 Apr 2024 17:43:32 -0300 Message-ID: <20240425204332.221162-2-andrealmeid@igalia.com> X-Mailer: git-send-email 2.44.0 In-Reply-To: <20240425204332.221162-1-andrealmeid@igalia.com> References: <20240425204332.221162-1-andrealmeid@igalia.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Add a new futex mode for futex wait, the futex spin. Given the FUTEX2_SPIN flag, parse the futex value as the PID of the lock owner. Then, before going to the normal wait path, spins while the lock owner is running in a different CPU, to avoid the whole context switch operation and to quickly return to userspace. If the lock owner is not running, just sleep as the normal futex wait path. The check for the owner to be running or not is important to avoid spinning for something that won't be released quickly. Userspace is responsible on providing the proper PID, the kernel does a basic check. Signed-off-by: André Almeida --- include/uapi/linux/futex.h | 2 +- kernel/futex/futex.h | 6 ++- kernel/futex/waitwake.c | 79 +++++++++++++++++++++++++++++++++++++- 3 files changed, 83 insertions(+), 4 deletions(-) diff --git a/include/uapi/linux/futex.h b/include/uapi/linux/futex.h index d2ee625ea189..d77d692ffac2 100644 --- a/include/uapi/linux/futex.h +++ b/include/uapi/linux/futex.h @@ -63,7 +63,7 @@ #define FUTEX2_SIZE_U32 0x02 #define FUTEX2_SIZE_U64 0x03 #define FUTEX2_NUMA 0x04 - /* 0x08 */ +#define FUTEX2_SPIN 0x08 /* 0x10 */ /* 0x20 */ /* 0x40 */ diff --git a/kernel/futex/futex.h b/kernel/futex/futex.h index 8b195d06f4e8..180c1c10dc81 100644 --- a/kernel/futex/futex.h +++ b/kernel/futex/futex.h @@ -37,6 +37,7 @@ #define FLAGS_HAS_TIMEOUT 0x0040 #define FLAGS_NUMA 0x0080 #define FLAGS_STRICT 0x0100 +#define FLAGS_SPIN 0x0200 /* FUTEX_ to FLAGS_ */ static inline unsigned int futex_to_flags(unsigned int op) @@ -52,7 +53,7 @@ static inline unsigned int futex_to_flags(unsigned int op) return flags; } -#define FUTEX2_VALID_MASK (FUTEX2_SIZE_MASK | FUTEX2_PRIVATE) +#define FUTEX2_VALID_MASK (FUTEX2_SIZE_MASK | FUTEX2_PRIVATE | FUTEX2_SPIN) /* FUTEX2_ to FLAGS_ */ static inline unsigned int futex2_to_flags(unsigned int flags2) @@ -65,6 +66,9 @@ static inline unsigned int futex2_to_flags(unsigned int flags2) if (flags2 & FUTEX2_NUMA) flags |= FLAGS_NUMA; + if (flags2 & FUTEX2_SPIN) + flags |= FLAGS_SPIN; + return flags; } diff --git a/kernel/futex/waitwake.c b/kernel/futex/waitwake.c index 3a10375d9521..94feac92cf4f 100644 --- a/kernel/futex/waitwake.c +++ b/kernel/futex/waitwake.c @@ -372,6 +372,78 @@ void futex_wait_queue(struct futex_hash_bucket *hb, struct futex_q *q, __set_current_state(TASK_RUNNING); } +static inline bool task_on_cpu(struct task_struct *p) +{ +#ifdef CONFIG_SMP + return !!(p->on_cpu); +#else + return false; +#endif +} + +static int futex_spin(struct futex_hash_bucket *hb, struct futex_q *q, + struct hrtimer_sleeper *timeout, void __user *uaddr, u32 val) +{ + struct task_struct *p; + u32 pid, uval; + unsigned int i = 0; + + if (futex_get_value_locked(&uval, uaddr)) + return -EFAULT; + + pid = uval; + + p = find_get_task_by_vpid(pid); + if (!p) { + printk("%s: no task found with PID %d\n", __func__, pid); + return -EAGAIN; + } + + if (unlikely(p->flags & PF_KTHREAD)) { + put_task_struct(p); + printk("%s: can't spin in a kernel task\n", __func__); + return -EPERM; + } + + futex_queue(q, hb); + + if (timeout) + hrtimer_sleeper_start_expires(timeout, HRTIMER_MODE_ABS); + + while (1) { + if (likely(!plist_node_empty(&q->list))) { + if (timeout && !timeout->task) + return 0; + + /* spin */ + if (task_on_cpu(p)) { + i++; + continue; + /* task is not running, sleep */ + } else { + break; + } + } else { + printk("%s: woke after %d spins\n", __func__, i); + return 0; + } + } + + printk("%s: spinned %d times, sleeping\n", __func__, i); + + /* spinning didn't work, go to the normal path */ + set_current_state(TASK_INTERRUPTIBLE|TASK_FREEZABLE); + + if (likely(!plist_node_empty(&q->list))) { + if (!timeout || timeout->task) + schedule(); + } + + __set_current_state(TASK_RUNNING); + + return 0; +} + /** * futex_unqueue_multiple - Remove various futexes from their hash bucket * @v: The list of futexes to unqueue @@ -665,8 +737,11 @@ int __futex_wait(u32 __user *uaddr, unsigned int flags, u32 val, if (ret) return ret; - /* futex_queue and wait for wakeup, timeout, or a signal. */ - futex_wait_queue(hb, &q, to); + if (flags & FLAGS_SPIN) + futex_spin(hb, &q, to, uaddr, val); + else + /* futex_queue and wait for wakeup, timeout, or a signal. */ + futex_wait_queue(hb, &q, to); /* If we were woken (and unqueued), we succeeded, whatever. */ if (!futex_unqueue(&q)) -- 2.44.0