Received: by 2002:ab2:69cc:0:b0:1f4:be93:e15a with SMTP id n12csp1786155lqp; Mon, 15 Apr 2024 18:33:31 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCWRybmP2oC4+hWIvXi68KkkaM3HT/s+oB5/2Pl0A+PV6f6nNm5yh55iHO8/lK90nPAgifwx2X2uK5/BwiM27p993u5zlbdDkHtrC2iwVQ== X-Google-Smtp-Source: AGHT+IFOVCfCjEbS6vWhaBB9JoMOSK7BBuwlJ95iAUIEIZBGbo6Mby5mDhneIKD+7o3/a3wxc8d0 X-Received: by 2002:a05:6870:4597:b0:22e:dfbc:4d90 with SMTP id y23-20020a056870459700b0022edfbc4d90mr13437714oao.5.1713231211214; Mon, 15 Apr 2024 18:33:31 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1713231211; cv=pass; d=google.com; s=arc-20160816; b=RxcZMCeK0cokBx32c32rj9J3fkM9UI8ukL9KSwEmcWu/2OtJUb2f4phXpxcGk3R7BY GOceDZKdsrZAj6q0FzN4nyI1nTBDnB8/xq4THfC9VSKspubHnXmremF0L89Q67JvIIut Zc95kck/XT8O0rG7Rru6Keu2jmSyDPkjwWOC/2zXuEJKhu5NKyc2Jp2o2RXVPaEtiAET m1BzxEPvXh/Nf9V7TqRgU90RPnSrg3biMsw2oC8RysbVX2yMPV+bh+NdjLb4t7zsAcDa KZtMmArCHr0GEJRffat/yg1nVvUTw2AXWostirBoVwvLIOcHm6idnnPd29qUbVqcCDVI NGCw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature; bh=RYBMOq1phKF6kVrcJnfpZPVkjj2AQtC0kLQdg9MZn2g=; fh=eVY06L/C5FDTmF2RNqfudsnjwkB3tbhT/y33cGhZc0I=; b=kcfDGJJoiIDRoP9nbmSrKs4ki+nb2rt9qPSip6g5B5i9IIqz7oEXBLyQLswlBRfUjW NrGitHhmiyqaSjqIfNrD36aDxQnJFJYmootLkTMFCWm5anUogFcOIXZHzWVDSOo9XcJq l2vd/QTdgQdB8MWNgerpXTmXNY+TNCAcOneCo/GT6w/moXrJh87idn2jgeICX/C1Gr/O TG2BznRGiO9lzCGNyliKtVnVV5r7p8miNJsHscgGeM4FNAFTFJD1cL7JsAr7TYxP6r0B ySOqMi2mQEcTWA7/ZgaaqKXgNqXSE4ZH2usXbww9dHJiO1B/qSyZTYv6I1Aboa/eLyeA 8BdQ==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@codeweavers.com header.s=s1 header.b=n5AlHg5b; arc=pass (i=1 spf=pass spfdomain=codeweavers.com dkim=pass dkdomain=codeweavers.com dmarc=pass fromdomain=codeweavers.com); spf=pass (google.com: domain of linux-kernel+bounces-146068-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-146068-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=codeweavers.com Return-Path: Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org. [147.75.48.161]) by mx.google.com with ESMTPS id z127-20020a626585000000b006eabb98a961si8922084pfb.207.2024.04.15.18.33.30 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 15 Apr 2024 18:33:31 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-146068-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) client-ip=147.75.48.161; Authentication-Results: mx.google.com; dkim=pass header.i=@codeweavers.com header.s=s1 header.b=n5AlHg5b; arc=pass (i=1 spf=pass spfdomain=codeweavers.com dkim=pass dkdomain=codeweavers.com dmarc=pass fromdomain=codeweavers.com); spf=pass (google.com: domain of linux-kernel+bounces-146068-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-146068-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=codeweavers.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sy.mirrors.kernel.org (Postfix) with ESMTPS id 02FF0B22A17 for ; Tue, 16 Apr 2024 01:13:41 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 7C8D682C8E; Tue, 16 Apr 2024 01:10:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=codeweavers.com header.i=@codeweavers.com header.b="n5AlHg5b" Received: from mail.codeweavers.com (mail.codeweavers.com [4.36.192.163]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BFDF57483; Tue, 16 Apr 2024 01:10:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=4.36.192.163 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1713229832; cv=none; b=pDGTo05AdGIHCog1Uk+Vl5eOHBuUh1znkDAir2T8FSP9PrDdWQFrkvGExSFhxYcvaIYzfJEGYfiA4H8/VSWJhRpsQvabp5nX1sugkYuRjDEJn1/Z+MKfP9FtD9sG1NmEbPfVaz9BR0Anr7Aopej51Jf4KU+qOD/r7jMG1cNpLnc= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1713229832; c=relaxed/simple; bh=iCBKSs88BnhIi4wRYcrHmulPU5noVaFJKgE+8Pd4dp0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=r/vDwIh5PswaINmzRWw9R6uT0inWgpJhkDFn85EXTLBgCP6R6UzOM6eCWfbuv+K4bFWSsFQfwmugta7MZ/XP4iXN5xPAtvf3kljhaGpufPkdIRBTfGBv8OcqpsMKvaFLOoRR1X3ypu9/xDY/xi6ZHLT9lK0WBp4cXPPkb+EjBIM= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=codeweavers.com; spf=pass smtp.mailfrom=codeweavers.com; dkim=pass (2048-bit key) header.d=codeweavers.com header.i=@codeweavers.com header.b=n5AlHg5b; arc=none smtp.client-ip=4.36.192.163 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=codeweavers.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=codeweavers.com DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=codeweavers.com; s=s1; h=Message-ID:Date:Subject:Cc:To:From:Sender; bh=RYBMOq1phKF6kVrcJnfpZPVkjj2AQtC0kLQdg9MZn2g=; b=n5AlHg5b5Y/vtM+O8bN/DqgZsO 1GFW2Ifz9efIqvZeHPR7lSN0C3jD6/JvgdXafKWTh3RLxf1QOVBRNPgWRKSeMCF+rVgkTtz0NYLuX GabTHhdXdl8KCqOiHynjSOnr5Tub/2TgWt+6ibuZEIsBzxOWqu0Y3HNnNJDCdDt0GxNbGoV8C8zyY d+aRlRVvMJa7clCEUAVoqa1usznV4zm17NHZECoAW7+A5W9GHjjtwSL2zgji9yNi8Fb22tX02LzA3 W5I8qe9XuVVkWGr2Trj336NPvQ0Mqw1gcu5H8eKM+Wz8dNWUshQnc/iGmKSq2OIhyjqQtGFoH0Sjn joUOq3Ew==; Received: from cw137ip160.mn.codeweavers.com ([10.69.137.160] helo=camazotz.mn.codeweavers.com) by mail.codeweavers.com with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1rwXKy-00FbQv-1E; Mon, 15 Apr 2024 20:10:16 -0500 From: Elizabeth Figura To: Arnd Bergmann , Greg Kroah-Hartman , Jonathan Corbet , Shuah Khan Cc: linux-kernel@vger.kernel.org, linux-api@vger.kernel.org, wine-devel@winehq.org, =?UTF-8?q?Andr=C3=A9=20Almeida?= , Wolfram Sang , Arkadiusz Hiler , Peter Zijlstra , Andy Lutomirski , linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, Randy Dunlap , Ingo Molnar , Will Deacon , Waiman Long , Boqun Feng , Elizabeth Figura Subject: [PATCH v4 01/27] ntsync: Introduce NTSYNC_IOC_WAIT_ANY. Date: Mon, 15 Apr 2024 20:08:11 -0500 Message-ID: <20240416010837.333694-2-zfigura@codeweavers.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240416010837.333694-1-zfigura@codeweavers.com> References: <20240416010837.333694-1-zfigura@codeweavers.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit This corresponds to part of the functionality of the NT syscall NtWaitForMultipleObjects(). Specifically, it implements the behaviour where the third argument (wait_any) is TRUE, and it does not handle alertable waits. Those features have been split out into separate patches to ease review. This patch therefore implements the wait/wake infrastructure which comprises the core of ntsync's functionality. NTSYNC_IOC_WAIT_ANY is a vectored wait function similar to poll(). Unlike poll(), it "consumes" objects when they are signaled. For semaphores, this means decreasing one from the internal counter. At most one object can be consumed by this function. This wait/wake model is fundamentally different from that used anywhere else in the kernel, and for that reason ntsync does not use any existing infrastructure, such as futexes, kernel mutexes or semaphores, or wait_event(). Up to 64 objects can be waited on at once. As soon as one is signaled, the object with the lowest index is consumed, and that index is returned via the "index" field. A timeout is supported. The timeout is passed as a u64 nanosecond value, which represents absolute time measured against either the MONOTONIC or REALTIME clock (controlled by the flags argument). If U64_MAX is passed, the ioctl waits indefinitely. This ioctl validates that all objects belong to the relevant device. This is not necessary for any technical reason related to NTSYNC_IOC_WAIT_ANY, but will be necessary for NTSYNC_IOC_WAIT_ALL introduced in the following patch. Two u32s of padding are left in the ntsync_wait_args structure; one will be used by a patch later in the series (which is split out to ease review). Signed-off-by: Elizabeth Figura --- drivers/misc/ntsync.c | 250 ++++++++++++++++++++++++++++++++++++ include/uapi/linux/ntsync.h | 16 +++ 2 files changed, 266 insertions(+) diff --git a/drivers/misc/ntsync.c b/drivers/misc/ntsync.c index 3c2f743c58b0..c6f84a5fc8c0 100644 --- a/drivers/misc/ntsync.c +++ b/drivers/misc/ntsync.c @@ -6,11 +6,16 @@ */ #include +#include #include #include +#include +#include #include #include #include +#include +#include #include #include #include @@ -30,6 +35,8 @@ enum ntsync_type { * * Both rely on struct file for reference counting. Individual * ntsync_obj objects take a reference to the device when created. + * Wait operations take a reference to each object being waited on for + * the duration of the wait. */ struct ntsync_obj { @@ -47,12 +54,56 @@ struct ntsync_obj { __u32 max; } sem; } u; + + struct list_head any_waiters; +}; + +struct ntsync_q_entry { + struct list_head node; + struct ntsync_q *q; + struct ntsync_obj *obj; + __u32 index; +}; + +struct ntsync_q { + struct task_struct *task; + __u32 owner; + + /* + * Protected via atomic_try_cmpxchg(). Only the thread that wins the + * compare-and-swap may actually change object states and wake this + * task. + */ + atomic_t signaled; + + __u32 count; + struct ntsync_q_entry entries[]; }; struct ntsync_device { struct file *file; }; +static void try_wake_any_sem(struct ntsync_obj *sem) +{ + struct ntsync_q_entry *entry; + + lockdep_assert_held(&sem->lock); + + list_for_each_entry(entry, &sem->any_waiters, node) { + struct ntsync_q *q = entry->q; + int signaled = -1; + + if (!sem->u.sem.count) + break; + + if (atomic_try_cmpxchg(&q->signaled, &signaled, entry->index)) { + sem->u.sem.count--; + wake_up_process(q->task); + } + } +} + /* * Actually change the semaphore state, returning -EOVERFLOW if it is made * invalid. @@ -88,6 +139,8 @@ static int ntsync_sem_post(struct ntsync_obj *sem, void __user *argp) prev_count = sem->u.sem.count; ret = post_sem_state(sem, args); + if (!ret) + try_wake_any_sem(sem); spin_unlock(&sem->lock); @@ -141,6 +194,7 @@ static struct ntsync_obj *ntsync_alloc_obj(struct ntsync_device *dev, obj->dev = dev; get_file(dev->file); spin_lock_init(&obj->lock); + INIT_LIST_HEAD(&obj->any_waiters); return obj; } @@ -191,6 +245,200 @@ static int ntsync_create_sem(struct ntsync_device *dev, void __user *argp) return put_user(fd, &user_args->sem); } +static struct ntsync_obj *get_obj(struct ntsync_device *dev, int fd) +{ + struct file *file = fget(fd); + struct ntsync_obj *obj; + + if (!file) + return NULL; + + if (file->f_op != &ntsync_obj_fops) { + fput(file); + return NULL; + } + + obj = file->private_data; + if (obj->dev != dev) { + fput(file); + return NULL; + } + + return obj; +} + +static void put_obj(struct ntsync_obj *obj) +{ + fput(obj->file); +} + +static int ntsync_schedule(const struct ntsync_q *q, const struct ntsync_wait_args *args) +{ + ktime_t timeout = ns_to_ktime(args->timeout); + clockid_t clock = CLOCK_MONOTONIC; + ktime_t *timeout_ptr; + int ret = 0; + + timeout_ptr = (args->timeout == U64_MAX ? NULL : &timeout); + + if (args->flags & NTSYNC_WAIT_REALTIME) + clock = CLOCK_REALTIME; + + do { + if (signal_pending(current)) { + ret = -ERESTARTSYS; + break; + } + + set_current_state(TASK_INTERRUPTIBLE); + if (atomic_read(&q->signaled) != -1) { + ret = 0; + break; + } + ret = schedule_hrtimeout_range_clock(timeout_ptr, 0, HRTIMER_MODE_ABS, clock); + } while (ret < 0); + __set_current_state(TASK_RUNNING); + + return ret; +} + +/* + * Allocate and initialize the ntsync_q structure, but do not queue us yet. + */ +static int setup_wait(struct ntsync_device *dev, + const struct ntsync_wait_args *args, + struct ntsync_q **ret_q) +{ + const __u32 count = args->count; + int fds[NTSYNC_MAX_WAIT_COUNT]; + struct ntsync_q *q; + __u32 i, j; + + if (!args->owner) + return -EINVAL; + + if (args->pad || args->pad2 || (args->flags & ~NTSYNC_WAIT_REALTIME)) + return -EINVAL; + + if (args->count > NTSYNC_MAX_WAIT_COUNT) + return -EINVAL; + + if (copy_from_user(fds, u64_to_user_ptr(args->objs), + array_size(count, sizeof(*fds)))) + return -EFAULT; + + q = kmalloc(struct_size(q, entries, count), GFP_KERNEL); + if (!q) + return -ENOMEM; + q->task = current; + q->owner = args->owner; + atomic_set(&q->signaled, -1); + q->count = count; + + for (i = 0; i < count; i++) { + struct ntsync_q_entry *entry = &q->entries[i]; + struct ntsync_obj *obj = get_obj(dev, fds[i]); + + if (!obj) + goto err; + + entry->obj = obj; + entry->q = q; + entry->index = i; + } + + *ret_q = q; + return 0; + +err: + for (j = 0; j < i; j++) + put_obj(q->entries[j].obj); + kfree(q); + return -EINVAL; +} + +static void try_wake_any_obj(struct ntsync_obj *obj) +{ + switch (obj->type) { + case NTSYNC_TYPE_SEM: + try_wake_any_sem(obj); + break; + } +} + +static int ntsync_wait_any(struct ntsync_device *dev, void __user *argp) +{ + struct ntsync_wait_args args; + struct ntsync_q *q; + int signaled; + __u32 i; + int ret; + + if (copy_from_user(&args, argp, sizeof(args))) + return -EFAULT; + + ret = setup_wait(dev, &args, &q); + if (ret < 0) + return ret; + + /* queue ourselves */ + + for (i = 0; i < args.count; i++) { + struct ntsync_q_entry *entry = &q->entries[i]; + struct ntsync_obj *obj = entry->obj; + + spin_lock(&obj->lock); + list_add_tail(&entry->node, &obj->any_waiters); + spin_unlock(&obj->lock); + } + + /* check if we are already signaled */ + + for (i = 0; i < args.count; i++) { + struct ntsync_obj *obj = q->entries[i].obj; + + if (atomic_read(&q->signaled) != -1) + break; + + spin_lock(&obj->lock); + try_wake_any_obj(obj); + spin_unlock(&obj->lock); + } + + /* sleep */ + + ret = ntsync_schedule(q, &args); + + /* and finally, unqueue */ + + for (i = 0; i < args.count; i++) { + struct ntsync_q_entry *entry = &q->entries[i]; + struct ntsync_obj *obj = entry->obj; + + spin_lock(&obj->lock); + list_del(&entry->node); + spin_unlock(&obj->lock); + + put_obj(obj); + } + + signaled = atomic_read(&q->signaled); + if (signaled != -1) { + struct ntsync_wait_args __user *user_args = argp; + + /* even if we caught a signal, we need to communicate success */ + ret = 0; + + if (put_user(signaled, &user_args->index)) + ret = -EFAULT; + } else if (!ret) { + ret = -ETIMEDOUT; + } + + kfree(q); + return ret; +} + static int ntsync_char_open(struct inode *inode, struct file *file) { struct ntsync_device *dev; @@ -222,6 +470,8 @@ static long ntsync_char_ioctl(struct file *file, unsigned int cmd, switch (cmd) { case NTSYNC_IOC_CREATE_SEM: return ntsync_create_sem(dev, argp); + case NTSYNC_IOC_WAIT_ANY: + return ntsync_wait_any(dev, argp); default: return -ENOIOCTLCMD; } diff --git a/include/uapi/linux/ntsync.h b/include/uapi/linux/ntsync.h index dcfa38fdc93c..60ad414b5552 100644 --- a/include/uapi/linux/ntsync.h +++ b/include/uapi/linux/ntsync.h @@ -16,7 +16,23 @@ struct ntsync_sem_args { __u32 max; }; +#define NTSYNC_WAIT_REALTIME 0x1 + +struct ntsync_wait_args { + __u64 timeout; + __u64 objs; + __u32 count; + __u32 owner; + __u32 index; + __u32 flags; + __u32 pad; + __u32 pad2; +}; + +#define NTSYNC_MAX_WAIT_COUNT 64 + #define NTSYNC_IOC_CREATE_SEM _IOWR('N', 0x80, struct ntsync_sem_args) +#define NTSYNC_IOC_WAIT_ANY _IOWR('N', 0x82, struct ntsync_wait_args) #define NTSYNC_IOC_SEM_POST _IOWR('N', 0x81, __u32) -- 2.43.0