2021-08-21 15:54:40

by Pavel Begunkov

[permalink] [raw]
Subject: [PATCH v3 0/4] open/accept directly into io_uring fixed file table

Add an optional feature to open/accept directly into io_uring's fixed
file table bypassing the normal file table. Same behaviour if as the
snippet below, but in one operation:

sqe = prep_[open,accept](...);
cqe = submit_and_wait(sqe);
io_uring_register_files_update(uring_idx, (fd = cqe->res));
close((fd = cqe->res));

The idea in pretty old, and was brough up and implemented a year ago
by Josh Triplett, though haven't sought the light for some reasons.

The behaviour is controlled by setting sqe->file_index, where 0 implies
the old behaviour. If non-zero value is specified, then it will behave
as described and place the file into a fixed file slot
sqe->file_index - 1. A file table should be already created, the slot
should be valid and empty, otherwise the operation will fail.

we can't use IOSQE_FIXED_FILE to switch between modes, because accept
takes a file, and it already uses the flag with a different meaning.

since RFC:
- added attribution
- updated descriptions
- rebased

since v1:
- EBADF if slot is already used (Josh Triplett)
- alias index with splice_fd_in (Josh Triplett)
- fix a bound check bug

Pavel Begunkov (4):
net: add accept helper not installing fd
io_uring: openat directly into fixed fd table
io_uring: hand code io_accept() fd installing
io_uring: accept directly into fixed file table

fs/io_uring.c | 129 +++++++++++++++++++++++++++++-----
include/linux/socket.h | 3 +
include/uapi/linux/io_uring.h | 5 +-
net/socket.c | 71 ++++++++++---------
4 files changed, 157 insertions(+), 51 deletions(-)

--
2.32.0


2021-08-21 15:54:51

by Pavel Begunkov

[permalink] [raw]
Subject: [PATCH v3 3/4] io_uring: hand code io_accept() fd installing

Make io_accept() to handle file descriptor allocations and installation.
A preparation patch for bypassing file tables.

Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/io_uring.c | 27 ++++++++++++++++++++-------
1 file changed, 20 insertions(+), 7 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index b8ef5ac1f90d..a54994a4f4ae 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -4769,6 +4769,11 @@ static int io_accept_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
accept->addr_len = u64_to_user_ptr(READ_ONCE(sqe->addr2));
accept->flags = READ_ONCE(sqe->accept_flags);
accept->nofile = rlimit(RLIMIT_NOFILE);
+
+ if (accept->flags & ~(SOCK_CLOEXEC | SOCK_NONBLOCK))
+ return -EINVAL;
+ if (SOCK_NONBLOCK != O_NONBLOCK && (accept->flags & SOCK_NONBLOCK))
+ accept->flags = (accept->flags & ~SOCK_NONBLOCK) | O_NONBLOCK;
return 0;
}

@@ -4777,20 +4782,28 @@ static int io_accept(struct io_kiocb *req, unsigned int issue_flags)
struct io_accept *accept = &req->accept;
bool force_nonblock = issue_flags & IO_URING_F_NONBLOCK;
unsigned int file_flags = force_nonblock ? O_NONBLOCK : 0;
- int ret;
+ struct file *file;
+ int ret, fd;

if (req->file->f_flags & O_NONBLOCK)
req->flags |= REQ_F_NOWAIT;

- ret = __sys_accept4_file(req->file, file_flags, accept->addr,
- accept->addr_len, accept->flags,
- accept->nofile);
- if (ret == -EAGAIN && force_nonblock)
- return -EAGAIN;
- if (ret < 0) {
+ fd = __get_unused_fd_flags(accept->flags, accept->nofile);
+ if (unlikely(fd < 0))
+ return fd;
+
+ file = do_accept(req->file, file_flags, accept->addr, accept->addr_len,
+ accept->flags);
+ if (IS_ERR(file)) {
+ ret = PTR_ERR(file);
+ if (ret == -EAGAIN && force_nonblock)
+ return -EAGAIN;
if (ret == -ERESTARTSYS)
ret = -EINTR;
req_set_fail(req);
+ } else {
+ fd_install(fd, file);
+ ret = fd;
}
__io_req_complete(req, issue_flags, ret, 0);
return 0;
--
2.32.0

2021-08-21 15:55:32

by Pavel Begunkov

[permalink] [raw]
Subject: [PATCH v3 4/4] io_uring: accept directly into fixed file table

As done with open opcodes, allow accept to skip installing fd into
processes' file tables and put it directly into io_uring's fixed file
table. Same restrictions and design as for open.

Suggested-by: Josh Triplett <[email protected]>
Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/io_uring.c | 30 ++++++++++++++++++++++++------
1 file changed, 24 insertions(+), 6 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index a54994a4f4ae..0323e947f403 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -4759,10 +4759,11 @@ static int io_recv(struct io_kiocb *req, unsigned int issue_flags)
static int io_accept_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
{
struct io_accept *accept = &req->accept;
+ unsigned index;

if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
return -EINVAL;
- if (sqe->ioprio || sqe->len || sqe->buf_index || sqe->splice_fd_in)
+ if (sqe->ioprio || sqe->len || sqe->buf_index)
return -EINVAL;

accept->addr = u64_to_user_ptr(READ_ONCE(sqe->addr));
@@ -4770,6 +4771,17 @@ static int io_accept_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
accept->flags = READ_ONCE(sqe->accept_flags);
accept->nofile = rlimit(RLIMIT_NOFILE);

+ index = READ_ONCE(sqe->file_index);
+ req->buf_index = index;
+ if (index) {
+ if (req->open.how.flags & O_CLOEXEC)
+ return -EINVAL;
+ if (req->buf_index != index)
+ return -EINVAL;
+ }
+
+ if (req->buf_index && (accept->flags & SOCK_CLOEXEC))
+ return -EINVAL;
if (accept->flags & ~(SOCK_CLOEXEC | SOCK_NONBLOCK))
return -EINVAL;
if (SOCK_NONBLOCK != O_NONBLOCK && (accept->flags & SOCK_NONBLOCK))
@@ -4782,28 +4794,34 @@ static int io_accept(struct io_kiocb *req, unsigned int issue_flags)
struct io_accept *accept = &req->accept;
bool force_nonblock = issue_flags & IO_URING_F_NONBLOCK;
unsigned int file_flags = force_nonblock ? O_NONBLOCK : 0;
+ bool fixed = !!req->buf_index;
struct file *file;
int ret, fd;

if (req->file->f_flags & O_NONBLOCK)
req->flags |= REQ_F_NOWAIT;

- fd = __get_unused_fd_flags(accept->flags, accept->nofile);
- if (unlikely(fd < 0))
- return fd;
-
+ if (!fixed) {
+ fd = __get_unused_fd_flags(accept->flags, accept->nofile);
+ if (unlikely(fd < 0))
+ return fd;
+ }
file = do_accept(req->file, file_flags, accept->addr, accept->addr_len,
accept->flags);
if (IS_ERR(file)) {
+ if (!fixed)
+ put_unused_fd(fd);
ret = PTR_ERR(file);
if (ret == -EAGAIN && force_nonblock)
return -EAGAIN;
if (ret == -ERESTARTSYS)
ret = -EINTR;
req_set_fail(req);
- } else {
+ } else if (!fixed) {
fd_install(fd, file);
ret = fd;
+ } else {
+ ret = io_install_fixed_file(req, file, issue_flags);
}
__io_req_complete(req, issue_flags, ret, 0);
return 0;
--
2.32.0

2021-08-21 15:55:56

by Pavel Begunkov

[permalink] [raw]
Subject: [PATCH v3 2/4] io_uring: openat directly into fixed fd table

Instead of opening a file into a process's file table as usual and then
registering the fd within io_uring, some users may want to skip the
first step and place it directly into io_uring's fixed file table.
This patch adds such a capability for IORING_OP_OPENAT and
IORING_OP_OPENAT2.

The behaviour is controlled by setting sqe->file_index, where 0 implies
the old behaviour. If non-zero value is specified, then it will behave
as described and place the file into a fixed file slot
sqe->file_index - 1. A file table should be already created, the slot
should be valid and empty, otherwise the operation will fail.

Keep the error codes consistent with IORING_OP_FILES_UPDATE, ENXIO and
EINVAL on inappropriate fixed tables, and return EBADF on collision with
already registered file.

Note: we can't use IOSQE_FIXED_FILE to switch between modes, because
accept takes a file, and it already uses the flag with a different
meaning.

Suggested-by: Josh Triplett <[email protected]>
Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/io_uring.c | 82 +++++++++++++++++++++++++++++++----
include/uapi/linux/io_uring.h | 5 ++-
2 files changed, 78 insertions(+), 9 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 0fb75aa72b69..b8ef5ac1f90d 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -1063,6 +1063,9 @@ static void io_req_task_queue(struct io_kiocb *req);
static void io_submit_flush_completions(struct io_ring_ctx *ctx);
static int io_req_prep_async(struct io_kiocb *req);

+static int io_install_fixed_file(struct io_kiocb *req, struct file *file,
+ unsigned int issue_flags);
+
static struct kmem_cache *req_cachep;

static const struct file_operations io_uring_fops;
@@ -3815,11 +3818,12 @@ static int io_fallocate(struct io_kiocb *req, unsigned int issue_flags)
static int __io_openat_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
{
const char __user *fname;
+ unsigned index;
int ret;

if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
return -EINVAL;
- if (unlikely(sqe->ioprio || sqe->buf_index || sqe->splice_fd_in))
+ if (unlikely(sqe->ioprio || sqe->buf_index))
return -EINVAL;
if (unlikely(req->flags & REQ_F_FIXED_FILE))
return -EBADF;
@@ -3836,6 +3840,16 @@ static int __io_openat_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe
req->open.filename = NULL;
return ret;
}
+
+ index = READ_ONCE(sqe->file_index);
+ req->buf_index = index;
+ if (index) {
+ if (req->open.how.flags & O_CLOEXEC)
+ return -EINVAL;
+ if (req->buf_index != index)
+ return -EINVAL;
+ }
+
req->open.nofile = rlimit(RLIMIT_NOFILE);
req->flags |= REQ_F_NEED_CLEANUP;
return 0;
@@ -3873,8 +3887,8 @@ static int io_openat2(struct io_kiocb *req, unsigned int issue_flags)
{
struct open_flags op;
struct file *file;
- bool nonblock_set;
- bool resolve_nonblock;
+ bool resolve_nonblock, nonblock_set;
+ bool fixed = !!req->buf_index;
int ret;

ret = build_open_flags(&req->open.how, &op);
@@ -3893,9 +3907,11 @@ static int io_openat2(struct io_kiocb *req, unsigned int issue_flags)
op.open_flag |= O_NONBLOCK;
}

- ret = __get_unused_fd_flags(req->open.how.flags, req->open.nofile);
- if (ret < 0)
- goto err;
+ if (!fixed) {
+ ret = __get_unused_fd_flags(req->open.how.flags, req->open.nofile);
+ if (ret < 0)
+ goto err;
+ }

file = do_filp_open(req->open.dfd, req->open.filename, &op);
if (IS_ERR(file)) {
@@ -3904,7 +3920,8 @@ static int io_openat2(struct io_kiocb *req, unsigned int issue_flags)
* marginal gain for something that is now known to be a slower
* path. So just put it, and we'll get a new one when we retry.
*/
- put_unused_fd(ret);
+ if (!fixed)
+ put_unused_fd(ret);

ret = PTR_ERR(file);
/* only retry if RESOLVE_CACHED wasn't already set by application */
@@ -3917,7 +3934,11 @@ static int io_openat2(struct io_kiocb *req, unsigned int issue_flags)
if ((issue_flags & IO_URING_F_NONBLOCK) && !nonblock_set)
file->f_flags &= ~O_NONBLOCK;
fsnotify_open(file);
- fd_install(ret, file);
+
+ if (!fixed)
+ fd_install(ret, file);
+ else
+ ret = io_install_fixed_file(req, file, issue_flags);
err:
putname(req->open.filename);
req->flags &= ~REQ_F_NEED_CLEANUP;
@@ -7846,6 +7867,50 @@ static int io_sqe_file_register(struct io_ring_ctx *ctx, struct file *file,
#endif
}

+static int io_install_fixed_file(struct io_kiocb *req, struct file *file,
+ unsigned int issue_flags)
+{
+ struct io_ring_ctx *ctx = req->ctx;
+ int i = req->buf_index - 1;
+ bool force_nonblock = issue_flags & IO_URING_F_NONBLOCK;
+ struct io_fixed_file *file_slot;
+ int ret = -EBADF;
+
+ if (WARN_ON_ONCE(req->buf_index == 0))
+ goto err;
+
+ io_ring_submit_lock(ctx, !force_nonblock);
+ if (file->f_op == &io_uring_fops)
+ goto err;
+ ret = -ENXIO;
+ if (!ctx->file_data)
+ goto err;
+ ret = -EINVAL;
+ if (i >= ctx->nr_user_files)
+ goto err;
+
+ i = array_index_nospec(i, ctx->nr_user_files);
+ file_slot = io_fixed_file_slot(&ctx->file_table, i);
+ ret = -EBADF;
+ if (file_slot->file_ptr)
+ goto err;
+
+ *io_get_tag_slot(ctx->file_data, i) = 0;
+ io_fixed_file_set(file_slot, file);
+ ret = io_sqe_file_register(ctx, file, i);
+ if (ret) {
+ file_slot->file_ptr = 0;
+ goto err;
+ }
+
+ ret = 0;
+err:
+ io_ring_submit_unlock(ctx, !force_nonblock);
+ if (ret)
+ fput(file);
+ return ret;
+}
+
static int io_queue_rsrc_removal(struct io_rsrc_data *data, unsigned idx,
struct io_rsrc_node *node, void *rsrc)
{
@@ -10311,6 +10376,7 @@ static int __init io_uring_init(void)
BUILD_BUG_SQE_ELEM(40, __u16, buf_group);
BUILD_BUG_SQE_ELEM(42, __u16, personality);
BUILD_BUG_SQE_ELEM(44, __s32, splice_fd_in);
+ BUILD_BUG_SQE_ELEM(44, __u32, file_index);

BUILD_BUG_ON(sizeof(struct io_uring_files_update) !=
sizeof(struct io_uring_rsrc_update));
diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index 79126d5cd289..45a4f2373694 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -55,7 +55,10 @@ struct io_uring_sqe {
} __attribute__((packed));
/* personality to use, if used */
__u16 personality;
- __s32 splice_fd_in;
+ union {
+ __s32 splice_fd_in;
+ __u32 file_index;
+ };
__u64 __pad2[2];
};

--
2.32.0

2021-08-21 15:56:32

by Pavel Begunkov

[permalink] [raw]
Subject: [PATCH v3 1/4] net: add accept helper not installing fd

Introduce and reuse a helper that acts similarly to __sys_accept4_file()
but returns struct file instead of installing file descriptor. Will be
used by io_uring.

Signed-off-by: Pavel Begunkov <[email protected]>
---
include/linux/socket.h | 3 ++
net/socket.c | 71 ++++++++++++++++++++++--------------------
2 files changed, 40 insertions(+), 34 deletions(-)

diff --git a/include/linux/socket.h b/include/linux/socket.h
index 0d8e3dcb7f88..d3c1a42a2edd 100644
--- a/include/linux/socket.h
+++ b/include/linux/socket.h
@@ -421,6 +421,9 @@ extern int __sys_accept4_file(struct file *file, unsigned file_flags,
struct sockaddr __user *upeer_sockaddr,
int __user *upeer_addrlen, int flags,
unsigned long nofile);
+extern struct file *do_accept(struct file *file, unsigned file_flags,
+ struct sockaddr __user *upeer_sockaddr,
+ int __user *upeer_addrlen, int flags);
extern int __sys_accept4(int fd, struct sockaddr __user *upeer_sockaddr,
int __user *upeer_addrlen, int flags);
extern int __sys_socket(int family, int type, int protocol);
diff --git a/net/socket.c b/net/socket.c
index 0b2dad3bdf7f..532fff5a3684 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -1722,32 +1722,22 @@ SYSCALL_DEFINE2(listen, int, fd, int, backlog)
return __sys_listen(fd, backlog);
}

-int __sys_accept4_file(struct file *file, unsigned file_flags,
+struct file *do_accept(struct file *file, unsigned file_flags,
struct sockaddr __user *upeer_sockaddr,
- int __user *upeer_addrlen, int flags,
- unsigned long nofile)
+ int __user *upeer_addrlen, int flags)
{
struct socket *sock, *newsock;
struct file *newfile;
- int err, len, newfd;
+ int err, len;
struct sockaddr_storage address;

- if (flags & ~(SOCK_CLOEXEC | SOCK_NONBLOCK))
- return -EINVAL;
-
- if (SOCK_NONBLOCK != O_NONBLOCK && (flags & SOCK_NONBLOCK))
- flags = (flags & ~SOCK_NONBLOCK) | O_NONBLOCK;
-
sock = sock_from_file(file);
- if (!sock) {
- err = -ENOTSOCK;
- goto out;
- }
+ if (!sock)
+ return ERR_PTR(-ENOTSOCK);

- err = -ENFILE;
newsock = sock_alloc();
if (!newsock)
- goto out;
+ return ERR_PTR(-ENFILE);

newsock->type = sock->type;
newsock->ops = sock->ops;
@@ -1758,18 +1748,9 @@ int __sys_accept4_file(struct file *file, unsigned file_flags,
*/
__module_get(newsock->ops->owner);

- newfd = __get_unused_fd_flags(flags, nofile);
- if (unlikely(newfd < 0)) {
- err = newfd;
- sock_release(newsock);
- goto out;
- }
newfile = sock_alloc_file(newsock, flags, sock->sk->sk_prot_creator->name);
- if (IS_ERR(newfile)) {
- err = PTR_ERR(newfile);
- put_unused_fd(newfd);
- goto out;
- }
+ if (IS_ERR(newfile))
+ return newfile;

err = security_socket_accept(sock, newsock);
if (err)
@@ -1794,16 +1775,38 @@ int __sys_accept4_file(struct file *file, unsigned file_flags,
}

/* File flags are not inherited via accept() unlike another OSes. */
-
- fd_install(newfd, newfile);
- err = newfd;
-out:
- return err;
+ return newfile;
out_fd:
fput(newfile);
- put_unused_fd(newfd);
- goto out;
+ return ERR_PTR(err);
+}
+
+int __sys_accept4_file(struct file *file, unsigned file_flags,
+ struct sockaddr __user *upeer_sockaddr,
+ int __user *upeer_addrlen, int flags,
+ unsigned long nofile)
+{
+ struct file *newfile;
+ int newfd;

+ if (flags & ~(SOCK_CLOEXEC | SOCK_NONBLOCK))
+ return -EINVAL;
+
+ if (SOCK_NONBLOCK != O_NONBLOCK && (flags & SOCK_NONBLOCK))
+ flags = (flags & ~SOCK_NONBLOCK) | O_NONBLOCK;
+
+ newfd = __get_unused_fd_flags(flags, nofile);
+ if (unlikely(newfd < 0))
+ return newfd;
+
+ newfile = do_accept(file, file_flags, upeer_sockaddr, upeer_addrlen,
+ flags);
+ if (IS_ERR(newfile)) {
+ put_unused_fd(newfd);
+ return PTR_ERR(newfile);
+ }
+ fd_install(newfd, newfile);
+ return newfd;
}

/*
--
2.32.0

2021-08-22 02:20:02

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH v3 0/4] open/accept directly into io_uring fixed file table

On 8/21/21 9:52 AM, Pavel Begunkov wrote:
> Add an optional feature to open/accept directly into io_uring's fixed
> file table bypassing the normal file table. Same behaviour if as the
> snippet below, but in one operation:
>
> sqe = prep_[open,accept](...);
> cqe = submit_and_wait(sqe);
> io_uring_register_files_update(uring_idx, (fd = cqe->res));
> close((fd = cqe->res));
>
> The idea in pretty old, and was brough up and implemented a year ago
> by Josh Triplett, though haven't sought the light for some reasons.
>
> The behaviour is controlled by setting sqe->file_index, where 0 implies
> the old behaviour. If non-zero value is specified, then it will behave
> as described and place the file into a fixed file slot
> sqe->file_index - 1. A file table should be already created, the slot
> should be valid and empty, otherwise the operation will fail.
>
> we can't use IOSQE_FIXED_FILE to switch between modes, because accept
> takes a file, and it already uses the flag with a different meaning.
>
> since RFC:
> - added attribution
> - updated descriptions
> - rebased
>
> since v1:
> - EBADF if slot is already used (Josh Triplett)
> - alias index with splice_fd_in (Josh Triplett)
> - fix a bound check bug

With the prep series, this looks good to me now. Josh, what do you
think?

And we need the net folks to sign off on the first patch, of course.

--
Jens Axboe

2021-08-23 16:38:07

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [PATCH v3 1/4] net: add accept helper not installing fd

On Sat, 21 Aug 2021 16:52:37 +0100 Pavel Begunkov wrote:
> Introduce and reuse a helper that acts similarly to __sys_accept4_file()
> but returns struct file instead of installing file descriptor. Will be
> used by io_uring.
>
> Signed-off-by: Pavel Begunkov <[email protected]>

Acked-by: Jakub Kicinski <[email protected]>

2021-08-23 19:17:51

by Josh Triplett

[permalink] [raw]
Subject: Re: [PATCH v3 0/4] open/accept directly into io_uring fixed file table

On Sat, Aug 21, 2021 at 08:18:12PM -0600, Jens Axboe wrote:
> On 8/21/21 9:52 AM, Pavel Begunkov wrote:
> > Add an optional feature to open/accept directly into io_uring's fixed
> > file table bypassing the normal file table. Same behaviour if as the
> > snippet below, but in one operation:
> >
> > sqe = prep_[open,accept](...);
> > cqe = submit_and_wait(sqe);
> > io_uring_register_files_update(uring_idx, (fd = cqe->res));
> > close((fd = cqe->res));
> >
> > The idea in pretty old, and was brough up and implemented a year ago
> > by Josh Triplett, though haven't sought the light for some reasons.
> >
> > The behaviour is controlled by setting sqe->file_index, where 0 implies
> > the old behaviour. If non-zero value is specified, then it will behave
> > as described and place the file into a fixed file slot
> > sqe->file_index - 1. A file table should be already created, the slot
> > should be valid and empty, otherwise the operation will fail.
> >
> > we can't use IOSQE_FIXED_FILE to switch between modes, because accept
> > takes a file, and it already uses the flag with a different meaning.
> >
> > since RFC:
> > - added attribution
> > - updated descriptions
> > - rebased
> >
> > since v1:
> > - EBADF if slot is already used (Josh Triplett)
> > - alias index with splice_fd_in (Josh Triplett)
> > - fix a bound check bug
>
> With the prep series, this looks good to me now. Josh, what do you
> think?

I would still like to see this using a union with the `nofile` field in
io_open and io_accept, rather than overloading the 16-bit buf_index
field. That would avoid truncating to 16 bits, and make less work for
expansion to more than 16 bits of fixed file indexes.

(I'd also like that to actually use a union, rather than overloading the
meaning of buf_index/nofile.)

I personally still feel that using non-zero to signify index-plus-one is
both error-prone and not as future-compatible. I think we could do
better with no additional overhead. But I think the final call on that
interface is up to you, Jens. Do you think it'd be worth spending a flag
bit or using a different opcode, to get a cleaner interface? If you
don't, then I'd be fine with seeing this go in with just the io_open and
io_accept change.

- Josh Triplett

2021-08-23 19:41:42

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH v3 0/4] open/accept directly into io_uring fixed file table

On 8/23/21 1:13 PM, Josh Triplett wrote:
> On Sat, Aug 21, 2021 at 08:18:12PM -0600, Jens Axboe wrote:
>> On 8/21/21 9:52 AM, Pavel Begunkov wrote:
>>> Add an optional feature to open/accept directly into io_uring's fixed
>>> file table bypassing the normal file table. Same behaviour if as the
>>> snippet below, but in one operation:
>>>
>>> sqe = prep_[open,accept](...);
>>> cqe = submit_and_wait(sqe);
>>> io_uring_register_files_update(uring_idx, (fd = cqe->res));
>>> close((fd = cqe->res));
>>>
>>> The idea in pretty old, and was brough up and implemented a year ago
>>> by Josh Triplett, though haven't sought the light for some reasons.
>>>
>>> The behaviour is controlled by setting sqe->file_index, where 0 implies
>>> the old behaviour. If non-zero value is specified, then it will behave
>>> as described and place the file into a fixed file slot
>>> sqe->file_index - 1. A file table should be already created, the slot
>>> should be valid and empty, otherwise the operation will fail.
>>>
>>> we can't use IOSQE_FIXED_FILE to switch between modes, because accept
>>> takes a file, and it already uses the flag with a different meaning.
>>>
>>> since RFC:
>>> - added attribution
>>> - updated descriptions
>>> - rebased
>>>
>>> since v1:
>>> - EBADF if slot is already used (Josh Triplett)
>>> - alias index with splice_fd_in (Josh Triplett)
>>> - fix a bound check bug
>>
>> With the prep series, this looks good to me now. Josh, what do you
>> think?
>
> I would still like to see this using a union with the `nofile` field in
> io_open and io_accept, rather than overloading the 16-bit buf_index
> field. That would avoid truncating to 16 bits, and make less work for
> expansion to more than 16 bits of fixed file indexes.
>
> (I'd also like that to actually use a union, rather than overloading the
> meaning of buf_index/nofile.)

Agree, and in fact there's room in the open and accept command parts, so
we can just make it a separate entry there instead of using ->buf_index.
Then just pass in the index to io_install_fixed_file() instead of having
it pull it from req->buf_index.

> I personally still feel that using non-zero to signify index-plus-one is
> both error-prone and not as future-compatible. I think we could do
> better with no additional overhead. But I think the final call on that
> interface is up to you, Jens. Do you think it'd be worth spending a flag
> bit or using a different opcode, to get a cleaner interface? If you
> don't, then I'd be fine with seeing this go in with just the io_open and
> io_accept change.

I'd be inclined to go the extra opcode route instead, as the flag only
really would make sense to requests that instantiate file descriptors.
For this particular case, we'd need 3 new opcodes for
openat/openat2/accept, which is probably a worthwhile expenditure.

Pavel, what do you think? Switch to using a different opcode for the new
requests, and just grab some space in io_open and io_accept for the fd
and pass it in to install.

I do think that'd end up being less hackish and easier to grok for a
user.

--
Jens Axboe

2021-08-24 09:50:00

by Pavel Begunkov

[permalink] [raw]
Subject: Re: [PATCH v3 0/4] open/accept directly into io_uring fixed file table

On 8/23/21 8:40 PM, Jens Axboe wrote:
> On 8/23/21 1:13 PM, Josh Triplett wrote:
>> On Sat, Aug 21, 2021 at 08:18:12PM -0600, Jens Axboe wrote:
>>> On 8/21/21 9:52 AM, Pavel Begunkov wrote:
>>>> Add an optional feature to open/accept directly into io_uring's fixed
>>>> file table bypassing the normal file table. Same behaviour if as the
>>>> snippet below, but in one operation:
>>>>
>>>> sqe = prep_[open,accept](...);
>>>> cqe = submit_and_wait(sqe);
>>>> io_uring_register_files_update(uring_idx, (fd = cqe->res));
>>>> close((fd = cqe->res));
>>>>
>>>> The idea in pretty old, and was brough up and implemented a year ago
>>>> by Josh Triplett, though haven't sought the light for some reasons.
>>>>
>>>> The behaviour is controlled by setting sqe->file_index, where 0 implies
>>>> the old behaviour. If non-zero value is specified, then it will behave
>>>> as described and place the file into a fixed file slot
>>>> sqe->file_index - 1. A file table should be already created, the slot
>>>> should be valid and empty, otherwise the operation will fail.
>>>>
>>>> we can't use IOSQE_FIXED_FILE to switch between modes, because accept
>>>> takes a file, and it already uses the flag with a different meaning.
>>>>
>>>> since RFC:
>>>> - added attribution
>>>> - updated descriptions
>>>> - rebased
>>>>
>>>> since v1:
>>>> - EBADF if slot is already used (Josh Triplett)
>>>> - alias index with splice_fd_in (Josh Triplett)
>>>> - fix a bound check bug
>>>
>>> With the prep series, this looks good to me now. Josh, what do you
>>> think?
>>
>> I would still like to see this using a union with the `nofile` field in
>> io_open and io_accept, rather than overloading the 16-bit buf_index
>> field. That would avoid truncating to 16 bits, and make less work for
>> expansion to more than 16 bits of fixed file indexes.
>>
>> (I'd also like that to actually use a union, rather than overloading the
>> meaning of buf_index/nofile.)
>
> Agree, and in fact there's room in the open and accept command parts, so
> we can just make it a separate entry there instead of using ->buf_index.
> Then just pass in the index to io_install_fixed_file() instead of having
> it pull it from req->buf_index.

That's internal details, can be expanded at wish in the future, if we'd
ever need larger tables. ->buf_index already holds indexes to different
resources just fine.

Aliasing with nofile would rather be ugly, so the only option, as you
mentioned, is to grab some space from open/accept structs, but don't see
why we'd want it when there is a more convenient alternative.

>> I personally still feel that using non-zero to signify index-plus-one is
>> both error-prone and not as future-compatible. I think we could do
>> better with no additional overhead. But I think the final call on that
>> interface is up to you, Jens. Do you think it'd be worth spending a flag
>> bit or using a different opcode, to get a cleaner interface? If you
>> don't, then I'd be fine with seeing this go in with just the io_open and
>> io_accept change.
>
> I'd be inclined to go the extra opcode route instead, as the flag only
> really would make sense to requests that instantiate file descriptors.
> For this particular case, we'd need 3 new opcodes for
> openat/openat2/accept, which is probably a worthwhile expenditure.
>
> Pavel, what do you think? Switch to using a different opcode for the new
> requests, and just grab some space in io_open and io_accept for the fd
> and pass it in to install.

I don't get it, why it's even called hackish? How that's anyhow better?
To me the feature looks like a natural extension to the operations, just
like a read can be tuned with flags, so and creating new opcodes seems
a bit ugly, unnecessary taking space from opcodes and adding duplication
(even if both versions call the same handler).

First, why it's not future-compatible? It's a serious argument, but I
don't see where it came from. Do I miss something?

It's u32 now, and so will easily cover all indexes. SQE fields should
always be zeroed, that's a rule, liburing follows it, and there would
have been already lots of problems for users not honoring it. And there
will be a helper hiding all the index conversions for convenience.

void io_uring_prep_open_direct(sqe, index, ...)
{
io_uring_prep_open(sqe, ...);
sqe->file_index = index + 1;
}

--
Pavel Begunkov

2021-08-24 14:04:20

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH v3 0/4] open/accept directly into io_uring fixed file table

On 8/24/21 3:48 AM, Pavel Begunkov wrote:
> On 8/23/21 8:40 PM, Jens Axboe wrote:
>> On 8/23/21 1:13 PM, Josh Triplett wrote:
>>> On Sat, Aug 21, 2021 at 08:18:12PM -0600, Jens Axboe wrote:
>>>> On 8/21/21 9:52 AM, Pavel Begunkov wrote:
>>>>> Add an optional feature to open/accept directly into io_uring's fixed
>>>>> file table bypassing the normal file table. Same behaviour if as the
>>>>> snippet below, but in one operation:
>>>>>
>>>>> sqe = prep_[open,accept](...);
>>>>> cqe = submit_and_wait(sqe);
>>>>> io_uring_register_files_update(uring_idx, (fd = cqe->res));
>>>>> close((fd = cqe->res));
>>>>>
>>>>> The idea in pretty old, and was brough up and implemented a year ago
>>>>> by Josh Triplett, though haven't sought the light for some reasons.
>>>>>
>>>>> The behaviour is controlled by setting sqe->file_index, where 0 implies
>>>>> the old behaviour. If non-zero value is specified, then it will behave
>>>>> as described and place the file into a fixed file slot
>>>>> sqe->file_index - 1. A file table should be already created, the slot
>>>>> should be valid and empty, otherwise the operation will fail.
>>>>>
>>>>> we can't use IOSQE_FIXED_FILE to switch between modes, because accept
>>>>> takes a file, and it already uses the flag with a different meaning.
>>>>>
>>>>> since RFC:
>>>>> - added attribution
>>>>> - updated descriptions
>>>>> - rebased
>>>>>
>>>>> since v1:
>>>>> - EBADF if slot is already used (Josh Triplett)
>>>>> - alias index with splice_fd_in (Josh Triplett)
>>>>> - fix a bound check bug
>>>>
>>>> With the prep series, this looks good to me now. Josh, what do you
>>>> think?
>>>
>>> I would still like to see this using a union with the `nofile` field in
>>> io_open and io_accept, rather than overloading the 16-bit buf_index
>>> field. That would avoid truncating to 16 bits, and make less work for
>>> expansion to more than 16 bits of fixed file indexes.
>>>
>>> (I'd also like that to actually use a union, rather than overloading the
>>> meaning of buf_index/nofile.)
>>
>> Agree, and in fact there's room in the open and accept command parts, so
>> we can just make it a separate entry there instead of using ->buf_index.
>> Then just pass in the index to io_install_fixed_file() instead of having
>> it pull it from req->buf_index.
>
> That's internal details, can be expanded at wish in the future, if we'd
> ever need larger tables. ->buf_index already holds indexes to different
> resources just fine.

Sure it's internal and can always be changed, doesn't change the fact
that it's a bit iffy that it's used differently in different spots. As
it costs us nothing to simply add a 'fixed_file' u32 for io_accept and
io_open, I really think that should be done instead.

> Aliasing with nofile would rather be ugly, so the only option, as you
> mentioned, is to grab some space from open/accept structs, but don't see
> why we'd want it when there is a more convenient alternative.

Because it's a lot more readable and less error prone imho. Agree on the
union, we don't have to resort to that.

>>> I personally still feel that using non-zero to signify index-plus-one is
>>> both error-prone and not as future-compatible. I think we could do
>>> better with no additional overhead. But I think the final call on that
>>> interface is up to you, Jens. Do you think it'd be worth spending a flag
>>> bit or using a different opcode, to get a cleaner interface? If you
>>> don't, then I'd be fine with seeing this go in with just the io_open and
>>> io_accept change.
>>
>> I'd be inclined to go the extra opcode route instead, as the flag only
>> really would make sense to requests that instantiate file descriptors.
>> For this particular case, we'd need 3 new opcodes for
>> openat/openat2/accept, which is probably a worthwhile expenditure.
>>
>> Pavel, what do you think? Switch to using a different opcode for the new
>> requests, and just grab some space in io_open and io_accept for the fd
>> and pass it in to install.
>
> I don't get it, why it's even called hackish? How that's anyhow better?
> To me the feature looks like a natural extension to the operations, just
> like a read can be tuned with flags, so and creating new opcodes seems
> a bit ugly, unnecessary taking space from opcodes and adding duplication
> (even if both versions call the same handler).

I agree that it's a natural extension, the problem is that we have to do
unnatural things (somewhat) to make it work. I'm fine with using the
union for the splice_fd_in to pass it in, I don't think it's a big deal.

I do wish that IORING_OP_CLOSE would work with them, though. I think we
should to that as a followup patch. It's a bit odd to be able to open a
file with IORING_OP_OPENAT and not being able to close it with
IORING_OP_CLOSE. For the latter, we should just give it fixed file
support, which would be pretty trivial.

> First, why it's not future-compatible? It's a serious argument, but I
> don't see where it came from. Do I miss something?
>
> It's u32 now, and so will easily cover all indexes. SQE fields should
> always be zeroed, that's a rule, liburing follows it, and there would
> have been already lots of problems for users not honoring it. And there
> will be a helper hiding all the index conversions for convenience.
>
> void io_uring_prep_open_direct(sqe, index, ...)
> {
> io_uring_prep_open(sqe, ...);
> sqe->file_index = index + 1;
> }

Let's keep it the way that it is, but I do want to see the buf_index
thing go away and just req->open.fixed_file or whatever being used for
open and accept. We should fold that in.

--
Jens Axboe

2021-08-24 14:45:38

by Pavel Begunkov

[permalink] [raw]
Subject: Re: [PATCH v3 0/4] open/accept directly into io_uring fixed file table

On 8/24/21 3:02 PM, Jens Axboe wrote:
> On 8/24/21 3:48 AM, Pavel Begunkov wrote:
>> On 8/23/21 8:40 PM, Jens Axboe wrote:
>>> On 8/23/21 1:13 PM, Josh Triplett wrote:
>>>> On Sat, Aug 21, 2021 at 08:18:12PM -0600, Jens Axboe wrote:
>>>>> On 8/21/21 9:52 AM, Pavel Begunkov wrote:
>>>>>> Add an optional feature to open/accept directly into io_uring's fixed
>>>>>> file table bypassing the normal file table. Same behaviour if as the
>>>>>> snippet below, but in one operation:
>>>>>>
>>>>>> sqe = prep_[open,accept](...);
>>>>>> cqe = submit_and_wait(sqe);
>>>>>> io_uring_register_files_update(uring_idx, (fd = cqe->res));
>>>>>> close((fd = cqe->res));
>>>>>>
>>>>>> The idea in pretty old, and was brough up and implemented a year ago
>>>>>> by Josh Triplett, though haven't sought the light for some reasons.
>>>>>>
>>>>>> The behaviour is controlled by setting sqe->file_index, where 0 implies
>>>>>> the old behaviour. If non-zero value is specified, then it will behave
>>>>>> as described and place the file into a fixed file slot
>>>>>> sqe->file_index - 1. A file table should be already created, the slot
>>>>>> should be valid and empty, otherwise the operation will fail.
>>>>>>
>>>>>> we can't use IOSQE_FIXED_FILE to switch between modes, because accept
>>>>>> takes a file, and it already uses the flag with a different meaning.
>>>>>>
>>>>>> since RFC:
>>>>>> - added attribution
>>>>>> - updated descriptions
>>>>>> - rebased
>>>>>>
>>>>>> since v1:
>>>>>> - EBADF if slot is already used (Josh Triplett)
>>>>>> - alias index with splice_fd_in (Josh Triplett)
>>>>>> - fix a bound check bug
>>>>>
>>>>> With the prep series, this looks good to me now. Josh, what do you
>>>>> think?
>>>>
>>>> I would still like to see this using a union with the `nofile` field in
>>>> io_open and io_accept, rather than overloading the 16-bit buf_index
>>>> field. That would avoid truncating to 16 bits, and make less work for
>>>> expansion to more than 16 bits of fixed file indexes.
>>>>
>>>> (I'd also like that to actually use a union, rather than overloading the
>>>> meaning of buf_index/nofile.)
>>>
>>> Agree, and in fact there's room in the open and accept command parts, so
>>> we can just make it a separate entry there instead of using ->buf_index.
>>> Then just pass in the index to io_install_fixed_file() instead of having
>>> it pull it from req->buf_index.
>>
>> That's internal details, can be expanded at wish in the future, if we'd
>> ever need larger tables. ->buf_index already holds indexes to different
>> resources just fine.
>
> Sure it's internal and can always be changed, doesn't change the fact
> that it's a bit iffy that it's used differently in different spots. As
> it costs us nothing to simply add a 'fixed_file' u32 for io_accept and
> io_open, I really think that should be done instead.
>
>> Aliasing with nofile would rather be ugly, so the only option, as you
>> mentioned, is to grab some space from open/accept structs, but don't see
>> why we'd want it when there is a more convenient alternative.
>
> Because it's a lot more readable and less error prone imho. Agree on the
> union, we don't have to resort to that.

Ok, I don't have a strong opinion on that. Will resend



>>>> I personally still feel that using non-zero to signify index-plus-one is
>>>> both error-prone and not as future-compatible. I think we could do
>>>> better with no additional overhead. But I think the final call on that
>>>> interface is up to you, Jens. Do you think it'd be worth spending a flag
>>>> bit or using a different opcode, to get a cleaner interface? If you
>>>> don't, then I'd be fine with seeing this go in with just the io_open and
>>>> io_accept change.
>>>
>>> I'd be inclined to go the extra opcode route instead, as the flag only
>>> really would make sense to requests that instantiate file descriptors.
>>> For this particular case, we'd need 3 new opcodes for
>>> openat/openat2/accept, which is probably a worthwhile expenditure.
>>>
>>> Pavel, what do you think? Switch to using a different opcode for the new
>>> requests, and just grab some space in io_open and io_accept for the fd
>>> and pass it in to install.
>>
>> I don't get it, why it's even called hackish? How that's anyhow better?
>> To me the feature looks like a natural extension to the operations, just
>> like a read can be tuned with flags, so and creating new opcodes seems
>> a bit ugly, unnecessary taking space from opcodes and adding duplication
>> (even if both versions call the same handler).
>
> I agree that it's a natural extension, the problem is that we have to do
> unnatural things (somewhat) to make it work. I'm fine with using the
> union for the splice_fd_in to pass it in, I don't think it's a big deal.
>
> I do wish that IORING_OP_CLOSE would work with them, though. I think we
> should to that as a followup patch. It's a bit odd to be able to open a
> file with IORING_OP_OPENAT and not being able to close it with
> IORING_OP_CLOSE. For the latter, we should just give it fixed file
> support, which would be pretty trivial.
>
>> First, why it's not future-compatible? It's a serious argument, but I
>> don't see where it came from. Do I miss something?
>>
>> It's u32 now, and so will easily cover all indexes. SQE fields should
>> always be zeroed, that's a rule, liburing follows it, and there would
>> have been already lots of problems for users not honoring it. And there
>> will be a helper hiding all the index conversions for convenience.
>>
>> void io_uring_prep_open_direct(sqe, index, ...)
>> {
>> io_uring_prep_open(sqe, ...);
>> sqe->file_index = index + 1;
>> }
>
> Let's keep it the way that it is, but I do want to see the buf_index
> thing go away and just req->open.fixed_file or whatever being used for
> open and accept. We should fold that in.

--
Pavel Begunkov