Received: by 2002:a05:7208:3003:b0:81:def:69cd with SMTP id f3csp4318706rba; Tue, 2 Apr 2024 13:26:49 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCWUK/KRNP78p3PrGOVEtLKL3UleGIW47oYH3+88uwtXBPEkdXUQBYQ5H8PAq2x/zhccfd6xvT8himsMSEXH0Cgy8D3/bUKcBD3vmUXYNg== X-Google-Smtp-Source: AGHT+IE1msbIQ89DwGjeS32Bs74rKOqZ/D+3khnkKKr15sAPr8UEVUO/gjPTe5A6O3ZrcLkbTk/U X-Received: by 2002:a17:90a:b784:b0:29c:7701:bbe1 with SMTP id m4-20020a17090ab78400b0029c7701bbe1mr10862883pjr.28.1712089608850; Tue, 02 Apr 2024 13:26:48 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1712089608; cv=pass; d=google.com; s=arc-20160816; b=jce/sXJpBA2RaZWhOsqxgkGT28QESs56+/1n07/1vXuAGC7b09eE99pnqO/2hs9Y7X Dnq5zufSWLYGkiWf5LlfwB4rFlDRK/IWcpbn/BwXSfXyiNMLB8MC+SNIx0wlIEdzfbNA q4yT/QQZvhdXD8HY7Ws0K5pCMZ4ig1hJeNDo1wuuxN7ZhNkovSngyTMvmmGkhjTIUKRA yLa7wBlyvLutweKB/TV8jwpO0972On/ZQzpX4X91PAM7bj/aEpttQsC4ZyWRfFDdz+VR Zq1tTjQnEZ5EGJkJVrFR53O+L/LIEWi64sGMHd29WvCBuzyQNUtE2wfOFcCa1TFQ8lDB FcEQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature; bh=xLz/H2XFRBrWVV5Gb4xnS/5GtE0qd548o457X3QGvso=; fh=wGCEzVyBfRDZTM0cpJYFggb2JPn0rAmYzRi6qHI4m48=; b=te/ERaVoQEdqyndgtcZyEDcUHoRgq3RiV5JZemtrzeSm5i1HgMH4kqiuIlc9OTcQuG pL8k0kSrGoQZJy1eUmgSv2LiOgda1oxqw2n2OgK7DAxLT1GSHlVjJnizbCB+JmhG+hli GU8npnvyhQx6WICfzpJeIKKXoLghxR11kGPIIodqy9om5WXkdyE1pcVRy6JHgYNoKs+Q 8mSfAa9KKty9uhpbIY0SR4fvzQw3cqhai23fYMHJClWDNpH2jCeQSBXTuG/R6D0OSxfL FnNMmXWRp6BF06olInzgKU/SRbXTBJMbXrKUAatTIxXZIOac1heL56J4MG1ACEzDtRxr f1iw==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@kernel-dk.20230601.gappssmtp.com header.s=20230601 header.b=d4IitUln; arc=pass (i=1 spf=pass spfdomain=kernel.dk dkim=pass dkdomain=kernel-dk.20230601.gappssmtp.com); spf=pass (google.com: domain of linux-kernel+bounces-128654-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-128654-linux.lists.archive=gmail.com@vger.kernel.org" Return-Path: Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [2604:1380:45e3:2400::1]) by mx.google.com with ESMTPS id z9-20020a17090a540900b002a0782a1427si13824851pjh.6.2024.04.02.13.26.48 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 02 Apr 2024 13:26:48 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-128654-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) client-ip=2604:1380:45e3:2400::1; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel-dk.20230601.gappssmtp.com header.s=20230601 header.b=d4IitUln; arc=pass (i=1 spf=pass spfdomain=kernel.dk dkim=pass dkdomain=kernel-dk.20230601.gappssmtp.com); spf=pass (google.com: domain of linux-kernel+bounces-128654-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-128654-linux.lists.archive=gmail.com@vger.kernel.org" Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id BFC8728AEDA for ; Tue, 2 Apr 2024 20:26:17 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 60C6315E211; Tue, 2 Apr 2024 20:25:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b="d4IitUln" Received: from mail-io1-f51.google.com (mail-io1-f51.google.com [209.85.166.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 91C4915DBB6 for ; Tue, 2 Apr 2024 20:25:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.166.51 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712089535; cv=none; b=TRNIgA5Tpz3oyDlw0BUD3fyf9gD5O7J1Z/PVGNs7gfrOc5eqQ/DDStNJ0Y+KA1mpmNXPGdvcBhlEEMV2g71u3HNYRqbJS3gBE6EZ7XT4/fAHUpibztdfFSMmr1LDkYfvCFAIN1BBE0WFdqkOsMR//EDC/ere5573RWfAfdxzEkA= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712089535; c=relaxed/simple; bh=OIR2Y5io1QhHTLei1BSk8STk0MqP9Cy7Pory5o7tlKI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=BwSXOEfWem8olfk5ij5KH8d5V9QQSAdb7Cy9K62cBlropxwAnKTU4VnTeFgVe45cmee9b8ZHVr7ErjP+Bz+2QCxkF15odw6RrFl6gTXoxyzWvADZufotYL42MsFxQL9YQrv5HtalP3+/RCSsDNgLDJNA2PBlHck34zcsE6SPlDU= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk; spf=pass smtp.mailfrom=kernel.dk; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b=d4IitUln; arc=none smtp.client-ip=209.85.166.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=kernel.dk Received: by mail-io1-f51.google.com with SMTP id ca18e2360f4ac-7c86e6f649aso18469339f.0 for ; Tue, 02 Apr 2024 13:25:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1712089532; x=1712694332; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=xLz/H2XFRBrWVV5Gb4xnS/5GtE0qd548o457X3QGvso=; b=d4IitUlno+PuogJ8NCl7dVSEOq176jVoZrWzBNZXyTbn/swjwxqIvBtrZOi891h0pY lIKtL79G7INaAgvEVMCcIKvG9BwpqaytpNNXz5WPdGVwTQY+17iJJmG1zqeh97XImOO/ Bd/m92nwuMJ3fGAhDT+vKpMQGwpjGki1ZCafFw+dFzU+ul7h/BrGHC6GfCI7Ec+rL+Wg P6UGtibQAdiIyIqQWaUxRONMT8Do8YVJBoSqfgLpSf9jqRe2dkT2SFAHe1hG42kvfIMN 4fPy7eDnm/OsHSi6v2IAnF/IzMqGRuhJPvugqktAiXVm3tP87YKBMjuI6j4sgpxU0yfD gT4Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1712089532; x=1712694332; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=xLz/H2XFRBrWVV5Gb4xnS/5GtE0qd548o457X3QGvso=; b=KgA9omFDfVFg85SuklaGw6GptJm5Jx5t8/20zpNHCGRi3JRRHV9KpHB8GibVSrl+yY nyCt40R9TMYIG4JBoO2LpUmlPoEvulIQXtBhkQcsynZT1Q6fKnqn4wBp/J+wsyoaea7T znM191KbYvrwObDtzTQGZlxB5+ljdOg3Z1aopBLfpTViNPTO4FSEPIcfCskRA3iMBlZ/ sSVmpzbzWYzW0LGXSaKsLyq5TI2QZCsBzmP1c+CyGW9UZWF1EVCmaxoZr7mhBrhZ3JDb lmpyVlT/jk3wfAp7IAbL2r5yR5D06AoatKyk6Imj7dFbIxZfOFkWN2CRmOTtR5HTpjDy qLWw== X-Forwarded-Encrypted: i=1; AJvYcCWvmD9ePEdvHl8Irf9k+SMyA8I1n7MePajVJJnBU7z7JpB3tTPjtcWSwQbXW7ugjy5BjI40OVjTegViXPIHwu2o/tYGgdZ37A5vnEYr X-Gm-Message-State: AOJu0Yy7wB7gzpoBM0+ne0DVl05ae9RJUodX8mEL6gHpEJTKHB68szZT MUo7HWZVd5+sDGu6Mdky/RSRjD9UM59vx73NeNF6PicTibmvONXMLyYQ9afKj74= X-Received: by 2002:a6b:c949:0:b0:7d0:bd2b:43ba with SMTP id z70-20020a6bc949000000b007d0bd2b43bamr9340547iof.0.1712089532551; Tue, 02 Apr 2024 13:25:32 -0700 (PDT) Received: from localhost.localdomain ([96.43.243.2]) by smtp.gmail.com with ESMTPSA id u9-20020a02cbc9000000b0047ec029412fsm3445956jaq.12.2024.04.02.13.25.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 02 Apr 2024 13:25:31 -0700 (PDT) From: Jens Axboe To: linux-fsdevel@vger.kernel.org Cc: brauner@kernel.org, linux-kernel@vger.kernel.org, Jens Axboe Subject: [PATCH 2/3] userfaultfd: convert to ->read_iter() Date: Tue, 2 Apr 2024 14:18:22 -0600 Message-ID: <20240402202524.1514963-3-axboe@kernel.dk> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240402202524.1514963-1-axboe@kernel.dk> References: <20240402202524.1514963-1-axboe@kernel.dk> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Rather than use the older style ->read() hook, use ->read_iter() so that userfaultfd can support both O_NONBLOCK and IOCB_NOWAIT for non-blocking read attempts. Split the fd setup into two parts, so that userfaultfd can mark the file mode with FMODE_NOWAIT before installing it into the process table. With that, we can also defer grabbing the mm until we know the rest will succeed, as the fd isn't visible before then. Signed-off-by: Jens Axboe --- fs/userfaultfd.c | 42 ++++++++++++++++++++++++++---------------- 1 file changed, 26 insertions(+), 16 deletions(-) diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index 60dcfafdc11a..7864c2dba858 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -282,7 +282,7 @@ static inline bool userfaultfd_huge_must_wait(struct userfaultfd_ctx *ctx, /* * Verify the pagetables are still not ok after having reigstered into * the fault_pending_wqh to avoid userland having to UFFDIO_WAKE any - * userfault that has already been resolved, if userfaultfd_read and + * userfault that has already been resolved, if userfaultfd_read_iter and * UFFDIO_COPY|ZEROPAGE are being run simultaneously on two different * threads. */ @@ -1177,34 +1177,34 @@ static ssize_t userfaultfd_ctx_read(struct userfaultfd_ctx *ctx, int no_wait, return ret; } -static ssize_t userfaultfd_read(struct file *file, char __user *buf, - size_t count, loff_t *ppos) +static ssize_t userfaultfd_read_iter(struct kiocb *iocb, struct iov_iter *to) { + struct file *file = iocb->ki_filp; struct userfaultfd_ctx *ctx = file->private_data; ssize_t _ret, ret = 0; struct uffd_msg msg; - int no_wait = file->f_flags & O_NONBLOCK; struct inode *inode = file_inode(file); + bool no_wait; if (!userfaultfd_is_initialized(ctx)) return -EINVAL; + no_wait = file->f_flags & O_NONBLOCK || iocb->ki_flags & IOCB_NOWAIT; for (;;) { - if (count < sizeof(msg)) + if (iov_iter_count(to) < sizeof(msg)) return ret ? ret : -EINVAL; _ret = userfaultfd_ctx_read(ctx, no_wait, &msg, inode); if (_ret < 0) return ret ? ret : _ret; - if (copy_to_user((__u64 __user *) buf, &msg, sizeof(msg))) + _ret = copy_to_iter(&msg, sizeof(msg), to); + if (_ret < 0) return ret ? ret : -EFAULT; ret += sizeof(msg); - buf += sizeof(msg); - count -= sizeof(msg); /* * Allow to read more than one fault at time but only * block if waiting for the very first one. */ - no_wait = O_NONBLOCK; + no_wait = true; } } @@ -2172,7 +2172,7 @@ static const struct file_operations userfaultfd_fops = { #endif .release = userfaultfd_release, .poll = userfaultfd_poll, - .read = userfaultfd_read, + .read_iter = userfaultfd_read_iter, .unlocked_ioctl = userfaultfd_ioctl, .compat_ioctl = compat_ptr_ioctl, .llseek = noop_llseek, @@ -2192,6 +2192,7 @@ static void init_once_userfaultfd_ctx(void *mem) static int new_userfaultfd(int flags) { struct userfaultfd_ctx *ctx; + struct file *file; int fd; BUG_ON(!current->mm); @@ -2215,16 +2216,25 @@ static int new_userfaultfd(int flags) init_rwsem(&ctx->map_changing_lock); atomic_set(&ctx->mmap_changing, 0); ctx->mm = current->mm; - /* prevent the mm struct to be freed */ - mmgrab(ctx->mm); + + fd = get_unused_fd_flags(O_RDONLY | (flags & UFFD_SHARED_FCNTL_FLAGS)); + if (fd < 0) + goto err_out; /* Create a new inode so that the LSM can block the creation. */ - fd = anon_inode_create_getfd("[userfaultfd]", &userfaultfd_fops, ctx, + file = anon_inode_create_getfile("[userfaultfd]", &userfaultfd_fops, ctx, O_RDONLY | (flags & UFFD_SHARED_FCNTL_FLAGS), NULL); - if (fd < 0) { - mmdrop(ctx->mm); - kmem_cache_free(userfaultfd_ctx_cachep, ctx); + if (IS_ERR(file)) { + fd = PTR_ERR(file); + goto err_out; } + /* prevent the mm struct to be freed */ + mmgrab(ctx->mm); + file->f_mode |= FMODE_NOWAIT; + fd_install(fd, file); + return fd; +err_out: + kmem_cache_free(userfaultfd_ctx_cachep, ctx); return fd; } -- 2.43.0