Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp1377155pxb; Thu, 28 Jan 2021 15:11:00 -0800 (PST) X-Google-Smtp-Source: ABdhPJz8RwX0h7YFiVY0S9hkbsrP8QZrE9oxoxaZbjZh9ElzeZZ5fwNkiNwChYoXSSsVW2XhLbPl X-Received: by 2002:a05:6402:26c9:: with SMTP id x9mr2120372edd.365.1611875460239; Thu, 28 Jan 2021 15:11:00 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1611875460; cv=none; d=google.com; s=arc-20160816; b=yJFPl2KCBer2roJV4QKhbsv8eXBtN5uMYNnLjxZs8RBpYRJAspYFGFwJ3sRwKm/X1O iXBjxYMgU3Pi6pl8JHoOzj7rFCllza7FrnG8kjeMH9KUULm4LPr8Ar72OURe+rIKPa6X ea4nuJvo/MDCjK6ltyC6cyMSbWTsLTJ/DJ4DmfLHviTt/ErNlZhIIfvHboNvZvb/qHJa ZsWZ1hvtp3AqWdrayi6V5w5g8rBtDc0AYsAO9gw7ijPoYboxEErDRVh6BTwJbc7UoN24 2MFAmAQ5Q/6+YTgZcbIALaDcWv09lueriYSDJuVoXwsIByjr3isXeI0HefGvVdmVirqm 2SAQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=KGUh/DS9Q4jQYnAuDXRtCp4N/HUOyQduKeiIfUX2BOU=; b=za8ZVv669Xktt6Y29pWVQCYodu8F9TrXDwvpkjWkbP5//kNYvB90MJU3loLxIoafqq P++LYPRtLC1FJYVyFahhu6RjOwGgXosCIUvnMqGT+h57IXPkFrRb8M1YHMjpAWVXWJ9E K78APU+2wND8mn5EPU3moSFaF21XpCaA8oGNZ/KAwU7h4vM1UfQgDzpU571h6pGruO/m WykRmLsqvZE/OEwEp5QhzHDoP/n/g2WvYLxY6WeE31SjR46eINLIYsm0MDmBftN3t9Y2 MLk69GGcsBpj3YhwT14z8Qx84zoxJEv3eNc2EIaxUQio08LDN+kgkChczOur7Q+83ho6 Gbgg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id y38si3958113ede.59.2021.01.28.15.10.35; Thu, 28 Jan 2021 15:11:00 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231421AbhA1XH5 (ORCPT + 99 others); Thu, 28 Jan 2021 18:07:57 -0500 Received: from hmm.wantstofly.org ([213.239.204.108]:50856 "EHLO mail.wantstofly.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229757AbhA1XHz (ORCPT ); Thu, 28 Jan 2021 18:07:55 -0500 Received: by mail.wantstofly.org (Postfix, from userid 1000) id EB62C7F45D; Fri, 29 Jan 2021 01:07:10 +0200 (EET) Date: Fri, 29 Jan 2021 01:07:10 +0200 From: Lennert Buytenhek To: David Laight , Al Viro Cc: Jens Axboe , linux-kernel@vger.kernel.org, io-uring@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-btrfs@vger.kernel.org Subject: Re: [RFC PATCH] io_uring: add support for IORING_OP_GETDENTS64 Message-ID: <20210128230710.GA190469@wantstofly.org> References: <20210123114152.GA120281@wantstofly.org> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, Jan 24, 2021 at 10:21:38PM +0000, David Laight wrote: > > One open question is whether IORING_OP_GETDENTS64 should be more like > > pread(2)?and allow passing in a starting offset to read from the > > directory from. (This would require some more surgery in fs/readdir.c.) > > Since directories are seekable this ought to work. > Modulo horrid issues with 32bit file offsets. The incremental patch below does this. (It doesn't apply cleanly on top of v1 of the IORING_OP_GETDENTS patch as I have other changes in my tree -- I'm including it just to illustrate the changes that would make this work.) This change seems to work, and makes IORING_OP_GETDENTS take an explicitly specified directory offset (instead of using the file's ->f_pos), making it more like pread(2), and I like the change from a conceptual point of view, but it's a bit ugly around iterate_dir_use_ctx_pos(). Any thoughts on how to do this more cleanly (without breaking iterate_dir() semantics)? > You'd need to return the final offset to allow another > read to continue from the end position. We can use the ->d_off value as returned in the last struct linux_dirent64 as the directory offset to continue reading from with the next IORING_OP_GETDENTS call, illustrated by the patch to uringfind.c at the bottom. diff --git a/fs/io_uring.c b/fs/io_uring.c index 13dd29f8ebb3..0f9707ed9294 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -576,6 +576,7 @@ struct io_getdents { struct file *file; struct linux_dirent64 __user *dirent; unsigned int count; + loff_t pos; }; struct io_completion { @@ -4584,9 +4585,10 @@ static int io_getdents_prep(struct io_kiocb *req, if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL)) return -EINVAL; - if (sqe->ioprio || sqe->off || sqe->rw_flags || sqe->buf_index) + if (sqe->ioprio || sqe->rw_flags || sqe->buf_index) return -EINVAL; + getdents->pos = READ_ONCE(sqe->off); getdents->dirent = u64_to_user_ptr(READ_ONCE(sqe->addr)); getdents->count = READ_ONCE(sqe->len); return 0; @@ -4601,7 +4603,8 @@ static int io_getdents(struct io_kiocb *req, bool force_nonblock) if (force_nonblock) return -EAGAIN; - ret = vfs_getdents(req->file, getdents->dirent, getdents->count); + ret = vfs_getdents(req->file, getdents->dirent, getdents->count, + &getdents->pos); if (ret < 0) { if (ret == -ERESTARTSYS) ret = -EINTR; diff --git a/fs/readdir.c b/fs/readdir.c index f52167c1eb61..d6bd78f6350a 100644 --- a/fs/readdir.c +++ b/fs/readdir.c @@ -37,7 +37,7 @@ } while (0) -int iterate_dir(struct file *file, struct dir_context *ctx) +int iterate_dir_use_ctx_pos(struct file *file, struct dir_context *ctx) { struct inode *inode = file_inode(file); bool shared = false; @@ -60,12 +60,10 @@ int iterate_dir(struct file *file, struct dir_context *ctx) res = -ENOENT; if (!IS_DEADDIR(inode)) { - ctx->pos = file->f_pos; if (shared) res = file->f_op->iterate_shared(file, ctx); else res = file->f_op->iterate(file, ctx); - file->f_pos = ctx->pos; fsnotify_access(file); file_accessed(file); } @@ -76,6 +74,17 @@ int iterate_dir(struct file *file, struct dir_context *ctx) out: return res; } + +int iterate_dir(struct file *file, struct dir_context *ctx) +{ + int res; + + ctx->pos = file->f_pos; + res = iterate_dir_use_ctx_pos(file, ctx); + file->f_pos = ctx->pos; + + return res; +} EXPORT_SYMBOL(iterate_dir); /* @@ -349,16 +358,18 @@ static int filldir64(struct dir_context *ctx, const char *name, int namlen, } int vfs_getdents(struct file *file, struct linux_dirent64 __user *dirent, - unsigned int count) + unsigned int count, loff_t *pos) { struct getdents_callback64 buf = { .ctx.actor = filldir64, + .ctx.pos = *pos, .count = count, .current_dir = dirent }; int error; - error = iterate_dir(file, &buf.ctx); + error = iterate_dir_use_ctx_pos(file, &buf.ctx); + *pos = buf.ctx.pos; if (error >= 0) error = buf.error; if (buf.prev_reclen) { @@ -384,7 +395,7 @@ SYSCALL_DEFINE3(getdents64, unsigned int, fd, if (!f.file) return -EBADF; - error = vfs_getdents(f.file, dirent, count); + error = vfs_getdents(f.file, dirent, count, &f.file->f_pos); fdput_pos(f); return error; } diff --git a/include/linux/fs.h b/include/linux/fs.h index 114885d3f6c4..4d9d96163f92 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -3107,11 +3107,12 @@ const char *simple_get_link(struct dentry *, struct inode *, struct delayed_call *); extern const struct inode_operations simple_symlink_inode_operations; +extern int iterate_dir_use_ctx_pos(struct file *, struct dir_context *); extern int iterate_dir(struct file *, struct dir_context *); struct linux_dirent64; int vfs_getdents(struct file *file, struct linux_dirent64 __user *dirent, - unsigned int count); + unsigned int count, loff_t *pos); int vfs_fstatat(int dfd, const char __user *filename, struct kstat *stat, int flags); Corresponding uringfind.c change: diff --git a/uringfind.c b/uringfind.c index 4282296..e140388 100644 --- a/uringfind.c +++ b/uringfind.c @@ -22,9 +22,10 @@ struct linux_dirent64 { }; static inline void io_uring_prep_getdents(struct io_uring_sqe *sqe, int fd, - void *buf, unsigned int count) + void *buf, unsigned int count, + uint64_t off) { - io_uring_prep_rw(IORING_OP_GETDENTS, sqe, fd, buf, count, 0); + io_uring_prep_rw(IORING_OP_GETDENTS, sqe, fd, buf, count, off); } @@ -38,6 +39,7 @@ struct dir { struct dir *parent; int fd; + uint64_t off; uint8_t buf[16384]; char name[0]; }; @@ -131,7 +133,8 @@ static void schedule_readdir(struct dir *dir) struct io_uring_sqe *sqe; sqe = get_sqe(); - io_uring_prep_getdents(sqe, dir->fd, dir->buf, sizeof(dir->buf)); + io_uring_prep_getdents(sqe, dir->fd, dir->buf, sizeof(dir->buf), + dir->off); io_uring_sqe_set_data(sqe, dir); } @@ -145,6 +148,7 @@ static void opendir_completion(struct dir *dir, int ret) } dir->fd = ret; + dir->off = 0; schedule_readdir(dir); } @@ -179,6 +183,7 @@ static void readdir_completion(struct dir *dir, int ret) schedule_opendir(dir, dent->d_name); } + dir->off = dent->d_off; bufp += dent->d_reclen; }