2021-12-30 01:36:24

by Ammar Faizi

[permalink] [raw]
Subject: [RFC PATCH v1 0/3] io_uring: Add sendto(2) and recvfrom(2) support

Hello,

This RFC patchset adds sendto(2) and recvfrom(2) support for io_uring.
It also addresses an issue in the liburing GitHub repository [1].


## Motivations:
1) By using `sendto()` and `recvfrom()` we can make the submission
simpler compared to always using `sendmsg()` and `recvmsg()` from
the userspace.

2) There is a historical patch that tried to add the same
functionality, but did not end up being applied. [2]

On Tue, 7 Jul 2020 12:29:18 -0600, Jens Axboe <[email protected]> wrote:
> In a private conversation with the author, a good point was brought
> up that the sendto/recvfrom do not require an allocation of an async
> context, if we need to defer or go async with the request. I think
> that's a major win, to be honest. There are other benefits as well
> (like shorter path), but to me, the async less part is nice and will
> reduce overhead


## Changes summary
There are 3 patches in this series.

PATCH 1/3 renames io_recv to io_recvfrom and io_send to io_sendto.
Note that

send(sockfd, buf, len, flags);

is equivalent to

sendto(sockfd, buf, len, flags, NULL, 0);

and
recv(sockfd, buf, len, flags);

is equivalent to

recvfrom(sockfd, buf, len, flags, NULL, NULL);

So it is saner to have `send` and `recv` directed to `sendto` and
`recvfrom` instead of the opposite with respect to the name.


PATCH 2/3 makes `move_addr_to_user()` be a non static function. This
function lives in net/socket.c, we need to call this from io_uring
to add `recvfrom()` support for liburing. Added net files maintainers
to the CC list.

PATCH 3/3 adds `sendto(2)` and `recvfrom(2)` support for io_uring.
Added two new opcodes: IORING_OP_SENDTO and IORING_OP_RECVFROM.

## How to test

This patchset is based on "for-next" branch commit:

aafc6e6eba29c890b0031267fc37c43490447c81 ("Merge branch 'for-5.17/io_uring' into for-next")

It is also available in the Git repository at:

https://github.com/ammarfaizi2/linux-block ammarfaizi2/linux-block/io_uring-recvfrom-sendto


I also added the liburing support and test. The liburing support is
based on "xattr-getdents64" branch commit:

18d71076f6c97e1b25aa0e3b0e12a913ec4717fa ("src/include/liburing.h: style cleanups")

It is available in the Git repository at:

https://github.com/ammarfaizi2/liburing sendto-recvfrom


Link: https://github.com/axboe/liburing/issues/397 [1]
Link: https://lore.kernel.org/io-uring/[email protected]/ [2]
Cc: "David S. Miller" <[email protected]>
Cc: Jakub Kicinski <[email protected]>
Cc: [email protected]
Signed-off-by: Ammar Faizi <[email protected]>
---
Ammar Faizi (3):
io_uring: Rename `io_{send,recv}` to `io_{sendto,recvfrom}`
net: Make `move_addr_to_user()` be a non static function
io_uring: Add `sendto(2)` and `recvfrom(2)` support

fs/io_uring.c | 88 +++++++++++++++++++++++++++++++----
include/linux/socket.h | 2 +
include/uapi/linux/io_uring.h | 2 +
net/socket.c | 4 +-
4 files changed, 86 insertions(+), 10 deletions(-)


base-commit: aafc6e6eba29c890b0031267fc37c43490447c81
--
Ammar Faizi


2021-12-30 01:36:27

by Ammar Faizi

[permalink] [raw]
Subject: [RFC PATCH v1 1/3] io_uring: Rename `io_{send,recv}` to `io_{sendto,recvfrom}`

Currently we can perform `send` and `recv` via io_uring. And now, we
are going to add `sendto` and `recvfrom` support for io_uring.

Note that:
Calling `send(fd, buf, len, flags)` is equivalent to calling
`sendto(fd, buf, len, flags, NULL, 0)`. Therefore, `sendto`
is a superset of `send`.

Calling `recv(fd, buf, len, flags)` is equivalent to calling
`recvfrom(fd, buf, len, flags, NULL, NULL)`. Therefore, `recvfrom`
is a superset of `recv`.

As such, let's direct the current supported `IORING_OP_{SEND,RECV}` to
`io_{sendto,recvfrom}`. These functions will also be used for
`IORING_OP_{SENDTO,RECVFROM}` operation in the next patches.

Signed-off-by: Ammar Faizi <[email protected]>
---
fs/io_uring.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 90002bb3fdf4..d564f98d5d3b 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -5273,7 +5273,7 @@ static int io_sendmsg(struct io_kiocb *req, unsigned int issue_flags)
return 0;
}

-static int io_send(struct io_kiocb *req, unsigned int issue_flags)
+static int io_sendto(struct io_kiocb *req, unsigned int issue_flags)
{
struct io_sr_msg *sr = &req->sr_msg;
struct msghdr msg;
@@ -5499,7 +5499,7 @@ static int io_recvmsg(struct io_kiocb *req, unsigned int issue_flags)
return 0;
}

-static int io_recv(struct io_kiocb *req, unsigned int issue_flags)
+static int io_recvfrom(struct io_kiocb *req, unsigned int issue_flags)
{
struct io_buffer *kbuf;
struct io_sr_msg *sr = &req->sr_msg;
@@ -7061,13 +7061,13 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags)
ret = io_sendmsg(req, issue_flags);
break;
case IORING_OP_SEND:
- ret = io_send(req, issue_flags);
+ ret = io_sendto(req, issue_flags);
break;
case IORING_OP_RECVMSG:
ret = io_recvmsg(req, issue_flags);
break;
case IORING_OP_RECV:
- ret = io_recv(req, issue_flags);
+ ret = io_recvfrom(req, issue_flags);
break;
case IORING_OP_TIMEOUT:
ret = io_timeout(req, issue_flags);
--
2.32.0


2021-12-30 01:36:32

by Ammar Faizi

[permalink] [raw]
Subject: [RFC PATCH v1 2/3] net: Make `move_addr_to_user()` be a non static function

In order to add recvfrom support for io_uring, we need to call
`move_addr_to_user()` in fs/io_uring.c.

This makes `move_addr_to_user()` be a non static function so we can
call it from io_uring.

Cc: "David S. Miller" <[email protected]>
Cc: Jakub Kicinski <[email protected]>
Cc: [email protected]
Signed-off-by: Ammar Faizi <[email protected]>
---
include/linux/socket.h | 2 ++
net/socket.c | 4 ++--
2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/include/linux/socket.h b/include/linux/socket.h
index 8ef26d89ef49..0d0bc1ace50c 100644
--- a/include/linux/socket.h
+++ b/include/linux/socket.h
@@ -371,6 +371,8 @@ struct ucred {
#define IPX_TYPE 1

extern int move_addr_to_kernel(void __user *uaddr, int ulen, struct sockaddr_storage *kaddr);
+extern int move_addr_to_user(struct sockaddr_storage *kaddr, int klen,
+ void __user *uaddr, int __user *ulen);
extern int put_cmsg(struct msghdr*, int level, int type, int len, void *data);

struct timespec64;
diff --git a/net/socket.c b/net/socket.c
index 7f64a6eccf63..af521d351c8a 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -267,8 +267,8 @@ int move_addr_to_kernel(void __user *uaddr, int ulen, struct sockaddr_storage *k
* specified. Zero is returned for a success.
*/

-static int move_addr_to_user(struct sockaddr_storage *kaddr, int klen,
- void __user *uaddr, int __user *ulen)
+int move_addr_to_user(struct sockaddr_storage *kaddr, int klen,
+ void __user *uaddr, int __user *ulen)
{
int err;
int len;
--
2.32.0


2021-12-30 01:36:34

by Ammar Faizi

[permalink] [raw]
Subject: [RFC PATCH v1 3/3] io_uring: Add `sendto(2)` and `recvfrom(2)` support

This adds sendto(2) and recvfrom(2) support for io_uring.

New opcodes:
IORING_OP_SENDTO
IORING_OP_RECVFROM

Signed-off-by: Ammar Faizi <[email protected]>
---
fs/io_uring.c | 80 +++++++++++++++++++++++++++++++++--
include/uapi/linux/io_uring.h | 2 +
2 files changed, 78 insertions(+), 4 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index d564f98d5d3b..4406c044798f 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -575,7 +575,15 @@ struct io_sr_msg {
union {
struct compat_msghdr __user *umsg_compat;
struct user_msghdr __user *umsg;
- void __user *buf;
+
+ struct {
+ void __user *buf;
+ struct sockaddr __user *addr;
+ union {
+ int sendto_addr_len;
+ int __user *recvfrom_addr_len;
+ };
+ };
};
int msg_flags;
int bgid;
@@ -1133,6 +1141,19 @@ static const struct io_op_def io_op_defs[] = {
.needs_file = 1
},
[IORING_OP_GETXATTR] = {},
+ [IORING_OP_SENDTO] = {
+ .needs_file = 1,
+ .unbound_nonreg_file = 1,
+ .pollout = 1,
+ .audit_skip = 1,
+ },
+ [IORING_OP_RECVFROM] = {
+ .needs_file = 1,
+ .unbound_nonreg_file = 1,
+ .pollin = 1,
+ .buffer_select = 1,
+ .audit_skip = 1,
+ },
};

/* requests with any of those set should undergo io_disarm_next() */
@@ -5216,12 +5237,24 @@ static int io_sendmsg_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
return -EINVAL;

+ /*
+ * For IORING_OP_SEND{,TO}, the assignment to @sr->umsg
+ * is equivalent to an assignment to @sr->buf.
+ */
sr->umsg = u64_to_user_ptr(READ_ONCE(sqe->addr));
+
sr->len = READ_ONCE(sqe->len);
sr->msg_flags = READ_ONCE(sqe->msg_flags) | MSG_NOSIGNAL;
if (sr->msg_flags & MSG_DONTWAIT)
req->flags |= REQ_F_NOWAIT;

+ if (req->opcode == IORING_OP_SENDTO) {
+ sr->addr = u64_to_user_ptr(READ_ONCE(sqe->addr2));
+ sr->sendto_addr_len = READ_ONCE(sqe->addr3);
+ } else {
+ sr->addr = (struct sockaddr __user *) NULL;
+ }
+
#ifdef CONFIG_COMPAT
if (req->ctx->compat)
sr->msg_flags |= MSG_CMSG_COMPAT;
@@ -5275,6 +5308,7 @@ static int io_sendmsg(struct io_kiocb *req, unsigned int issue_flags)

static int io_sendto(struct io_kiocb *req, unsigned int issue_flags)
{
+ struct sockaddr_storage address;
struct io_sr_msg *sr = &req->sr_msg;
struct msghdr msg;
struct iovec iov;
@@ -5291,10 +5325,20 @@ static int io_sendto(struct io_kiocb *req, unsigned int issue_flags)
if (unlikely(ret))
return ret;

- msg.msg_name = NULL;
+
msg.msg_control = NULL;
msg.msg_controllen = 0;
- msg.msg_namelen = 0;
+ if (sr->addr) {
+ ret = move_addr_to_kernel(sr->addr, sr->sendto_addr_len,
+ &address);
+ if (unlikely(ret < 0))
+ goto fail;
+ msg.msg_name = (struct sockaddr *) &address;
+ msg.msg_namelen = sr->sendto_addr_len;
+ } else {
+ msg.msg_name = NULL;
+ msg.msg_namelen = 0;
+ }

flags = req->sr_msg.msg_flags;
if (issue_flags & IO_URING_F_NONBLOCK)
@@ -5309,6 +5353,7 @@ static int io_sendto(struct io_kiocb *req, unsigned int issue_flags)
return -EAGAIN;
if (ret == -ERESTARTSYS)
ret = -EINTR;
+ fail:
req_set_fail(req);
}
__io_req_complete(req, issue_flags, ret, 0);
@@ -5427,13 +5472,25 @@ static int io_recvmsg_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
return -EINVAL;

+ /*
+ * For IORING_OP_RECV{,FROM}, the assignment to @sr->umsg
+ * is equivalent to an assignment to @sr->buf.
+ */
sr->umsg = u64_to_user_ptr(READ_ONCE(sqe->addr));
+
sr->len = READ_ONCE(sqe->len);
sr->bgid = READ_ONCE(sqe->buf_group);
sr->msg_flags = READ_ONCE(sqe->msg_flags) | MSG_NOSIGNAL;
if (sr->msg_flags & MSG_DONTWAIT)
req->flags |= REQ_F_NOWAIT;

+ if (req->opcode == IORING_OP_RECVFROM) {
+ sr->addr = u64_to_user_ptr(READ_ONCE(sqe->addr2));
+ sr->recvfrom_addr_len = u64_to_user_ptr(READ_ONCE(sqe->addr3));
+ } else {
+ sr->addr = (struct sockaddr __user *) NULL;
+ }
+
#ifdef CONFIG_COMPAT
if (req->ctx->compat)
sr->msg_flags |= MSG_CMSG_COMPAT;
@@ -5509,6 +5566,7 @@ static int io_recvfrom(struct io_kiocb *req, unsigned int issue_flags)
struct iovec iov;
unsigned flags;
int ret, min_ret = 0;
+ struct sockaddr_storage address;
bool force_nonblock = issue_flags & IO_URING_F_NONBLOCK;

sock = sock_from_file(req->file);
@@ -5526,7 +5584,7 @@ static int io_recvfrom(struct io_kiocb *req, unsigned int issue_flags)
if (unlikely(ret))
goto out_free;

- msg.msg_name = NULL;
+ msg.msg_name = sr->addr ? (struct sockaddr *) &address : NULL;
msg.msg_control = NULL;
msg.msg_controllen = 0;
msg.msg_namelen = 0;
@@ -5540,6 +5598,16 @@ static int io_recvfrom(struct io_kiocb *req, unsigned int issue_flags)
min_ret = iov_iter_count(&msg.msg_iter);

ret = sock_recvmsg(sock, &msg, flags);
+
+ if (ret >= 0 && sr->addr != NULL) {
+ int tmp;
+
+ tmp = move_addr_to_user(&address, msg.msg_namelen, sr->addr,
+ sr->recvfrom_addr_len);
+ if (tmp < 0)
+ ret = tmp;
+ }
+
out_free:
if (ret < min_ret) {
if (ret == -EAGAIN && force_nonblock)
@@ -6778,9 +6846,11 @@ static int io_req_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
case IORING_OP_SYNC_FILE_RANGE:
return io_sfr_prep(req, sqe);
case IORING_OP_SENDMSG:
+ case IORING_OP_SENDTO:
case IORING_OP_SEND:
return io_sendmsg_prep(req, sqe);
case IORING_OP_RECVMSG:
+ case IORING_OP_RECVFROM:
case IORING_OP_RECV:
return io_recvmsg_prep(req, sqe);
case IORING_OP_CONNECT:
@@ -7060,12 +7130,14 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags)
case IORING_OP_SENDMSG:
ret = io_sendmsg(req, issue_flags);
break;
+ case IORING_OP_SENDTO:
case IORING_OP_SEND:
ret = io_sendto(req, issue_flags);
break;
case IORING_OP_RECVMSG:
ret = io_recvmsg(req, issue_flags);
break;
+ case IORING_OP_RECVFROM:
case IORING_OP_RECV:
ret = io_recvfrom(req, issue_flags);
break;
diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index efc7ac9b3a6b..a360069d1e8e 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -150,6 +150,8 @@ enum {
IORING_OP_SETXATTR,
IORING_OP_FGETXATTR,
IORING_OP_GETXATTR,
+ IORING_OP_SENDTO,
+ IORING_OP_RECVFROM,

/* this goes last, obviously */
IORING_OP_LAST,
--
2.32.0


2021-12-30 12:01:14

by Ammar Faizi

[permalink] [raw]
Subject: [RFC PATCH v2 0/3] io_uring: Add sendto(2) and recvfrom(2) support

Hello,

This RFC patchset adds sendto(2) and recvfrom(2) support for io_uring.
It also addresses an issue in the liburing GitHub repository [1].


## Motivations:
1) By using `sendto()` and `recvfrom()` we can make the submission
simpler compared to always using `sendmsg()` and `recvmsg()` from
the userspace.

2) There is a historical patch that tried to add the same
functionality, but did not end up being applied. [2]

On Tue, 7 Jul 2020 12:29:18 -0600, Jens Axboe <[email protected]> wrote:
> In a private conversation with the author, a good point was brought
> up that the sendto/recvfrom do not require an allocation of an async
> context, if we need to defer or go async with the request. I think
> that's a major win, to be honest. There are other benefits as well
> (like shorter path), but to me, the async less part is nice and will
> reduce overhead


## Changes summary
There are 3 patches in this series.

PATCH 1/3 renames io_recv to io_recvfrom and io_send to io_sendto.
Note that

send(sockfd, buf, len, flags);

is equivalent to

sendto(sockfd, buf, len, flags, NULL, 0);

and
recv(sockfd, buf, len, flags);

is equivalent to

recvfrom(sockfd, buf, len, flags, NULL, NULL);

So it is saner to have `send` and `recv` directed to `sendto` and
`recvfrom` instead of the opposite with respect to the name.


PATCH 2/3 makes `move_addr_to_user()` be a non static function. This
function lives in net/socket.c, we need to call this from io_uring
to add `recvfrom()` support for liburing. Added net files maintainers
to the CC list.

PATCH 3/3 adds `sendto(2)` and `recvfrom(2)` support for io_uring.
Added two new opcodes: IORING_OP_SENDTO and IORING_OP_RECVFROM.


## How to test

This patchset is based on "for-next" branch commit:

bb3294e22482db4b7ec7cfbb2d0f5b53c1adcf86 ("Merge branch 'for-5.17/drivers' into for-next")

It is also available in the Git repository at:

https://github.com/ammarfaizi2/linux-block ammarfaizi2/linux-block/io_uring-recvfrom-sendto


I also added the liburing support and test. The liburing support is
based on "xattr-getdents64" branch commit:

55a9bf979f27f3a5c9f456f26dcfe16c4791667b ("src/include/liburing.h: style cleanups")

It is available in the Git repository at:

https://github.com/ammarfaizi2/liburing sendto-recvfrom

---
v2:
- Rebased the work, now this patchset is based on commit
bb3294e22482db4b7ec ("Merge branch 'for-5.17/drivers' into
for-next").

- In `io_recvfrom()`, mark the error check of `move_addr_to_user()`
call as unlikely.

- Fix build error when CONFIG_NET is undefined.

- Update liburing test (the branch is still the same, just force
pushed).

- Added Nugra to CC list (tester).

---
RFC v1: https://lore.kernel.org/io-uring/[email protected]/
Link: https://github.com/axboe/liburing/issues/397 [1]
Link: https://lore.kernel.org/io-uring/[email protected]/ [2]
Cc: "David S. Miller" <[email protected]>
Cc: Jakub Kicinski <[email protected]>
Cc: [email protected]
Cc: Nugra <[email protected]>
Signed-off-by: Ammar Faizi <[email protected]>
---
Ammar Faizi (3):
io_uring: Rename `io_{send,recv}` to `io_{sendto,recvfrom}`
net: Make `move_addr_to_user()` be a non static function
io_uring: Add `sendto(2)` and `recvfrom(2)` support

fs/io_uring.c | 92 +++++++++++++++++++++++++++++++----
include/linux/socket.h | 2 +
include/uapi/linux/io_uring.h | 2 +
net/socket.c | 4 +-
4 files changed, 88 insertions(+), 12 deletions(-)


base-commit: bb3294e22482db4b7ec7cfbb2d0f5b53c1adcf86
--
2.32.0


2021-12-30 12:01:23

by Ammar Faizi

[permalink] [raw]
Subject: [RFC PATCH v2 1/3] io_uring: Rename `io_{send,recv}` to `io_{sendto,recvfrom}`

Currently we can perform `send` and `recv` via io_uring. And now, we
are going to add `sendto` and `recvfrom` support for io_uring.

Note that:
Calling `send(fd, buf, len, flags)` is equivalent to calling
`sendto(fd, buf, len, flags, NULL, 0)`. Therefore, `sendto`
is a superset of `send`.

Calling `recv(fd, buf, len, flags)` is equivalent to calling
`recvfrom(fd, buf, len, flags, NULL, NULL)`. Therefore, `recvfrom`
is a superset of `recv`.

As such, let's direct the current supported `IORING_OP_{SEND,RECV}` to
`io_{sendto,recvfrom}`. These functions will also be used for
`IORING_OP_{SENDTO,RECVFROM}` operation in the next patches.

Cc: Nugra <[email protected]>
Signed-off-by: Ammar Faizi <[email protected]>
---
fs/io_uring.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 90002bb3fdf4..d564f98d5d3b 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -5273,7 +5273,7 @@ static int io_sendmsg(struct io_kiocb *req, unsigned int issue_flags)
return 0;
}

-static int io_send(struct io_kiocb *req, unsigned int issue_flags)
+static int io_sendto(struct io_kiocb *req, unsigned int issue_flags)
{
struct io_sr_msg *sr = &req->sr_msg;
struct msghdr msg;
@@ -5499,7 +5499,7 @@ static int io_recvmsg(struct io_kiocb *req, unsigned int issue_flags)
return 0;
}

-static int io_recv(struct io_kiocb *req, unsigned int issue_flags)
+static int io_recvfrom(struct io_kiocb *req, unsigned int issue_flags)
{
struct io_buffer *kbuf;
struct io_sr_msg *sr = &req->sr_msg;
@@ -7061,13 +7061,13 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags)
ret = io_sendmsg(req, issue_flags);
break;
case IORING_OP_SEND:
- ret = io_send(req, issue_flags);
+ ret = io_sendto(req, issue_flags);
break;
case IORING_OP_RECVMSG:
ret = io_recvmsg(req, issue_flags);
break;
case IORING_OP_RECV:
- ret = io_recv(req, issue_flags);
+ ret = io_recvfrom(req, issue_flags);
break;
case IORING_OP_TIMEOUT:
ret = io_timeout(req, issue_flags);
--
2.32.0


2021-12-30 12:01:30

by Ammar Faizi

[permalink] [raw]
Subject: [RFC PATCH v2 2/3] net: Make `move_addr_to_user()` be a non static function

In order to add recvfrom support for io_uring, we need to call
`move_addr_to_user()` in fs/io_uring.c.

This makes `move_addr_to_user()` be a non static function so we can
call it from io_uring.

Cc: "David S. Miller" <[email protected]>
Cc: Jakub Kicinski <[email protected]>
Cc: [email protected]
Cc: Nugra <[email protected]>
Signed-off-by: Ammar Faizi <[email protected]>
---
include/linux/socket.h | 2 ++
net/socket.c | 4 ++--
2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/include/linux/socket.h b/include/linux/socket.h
index 8ef26d89ef49..0d0bc1ace50c 100644
--- a/include/linux/socket.h
+++ b/include/linux/socket.h
@@ -371,6 +371,8 @@ struct ucred {
#define IPX_TYPE 1

extern int move_addr_to_kernel(void __user *uaddr, int ulen, struct sockaddr_storage *kaddr);
+extern int move_addr_to_user(struct sockaddr_storage *kaddr, int klen,
+ void __user *uaddr, int __user *ulen);
extern int put_cmsg(struct msghdr*, int level, int type, int len, void *data);

struct timespec64;
diff --git a/net/socket.c b/net/socket.c
index 7f64a6eccf63..af521d351c8a 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -267,8 +267,8 @@ int move_addr_to_kernel(void __user *uaddr, int ulen, struct sockaddr_storage *k
* specified. Zero is returned for a success.
*/

-static int move_addr_to_user(struct sockaddr_storage *kaddr, int klen,
- void __user *uaddr, int __user *ulen)
+int move_addr_to_user(struct sockaddr_storage *kaddr, int klen,
+ void __user *uaddr, int __user *ulen)
{
int err;
int len;
--
2.32.0


2021-12-30 12:01:31

by Ammar Faizi

[permalink] [raw]
Subject: [RFC PATCH v2 3/3] io_uring: Add `sendto(2)` and `recvfrom(2)` support

This adds sendto(2) and recvfrom(2) support for io_uring.

New opcodes:
IORING_OP_SENDTO
IORING_OP_RECVFROM

Cc: Nugra <[email protected]>
Link: https://github.com/axboe/liburing/issues/397
Signed-off-by: Ammar Faizi <[email protected]>
---

v2:
- In `io_recvfrom()`, mark the error check of `move_addr_to_user()`
call as unlikely.

- Fix build error when CONFIG_NET is undefined.

fs/io_uring.c | 84 ++++++++++++++++++++++++++++++++---
include/uapi/linux/io_uring.h | 2 +
2 files changed, 80 insertions(+), 6 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index d564f98d5d3b..3726958f8f58 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -575,7 +575,15 @@ struct io_sr_msg {
union {
struct compat_msghdr __user *umsg_compat;
struct user_msghdr __user *umsg;
- void __user *buf;
+
+ struct {
+ void __user *buf;
+ struct sockaddr __user *addr;
+ union {
+ int sendto_addr_len;
+ int __user *recvfrom_addr_len;
+ };
+ };
};
int msg_flags;
int bgid;
@@ -1133,6 +1141,19 @@ static const struct io_op_def io_op_defs[] = {
.needs_file = 1
},
[IORING_OP_GETXATTR] = {},
+ [IORING_OP_SENDTO] = {
+ .needs_file = 1,
+ .unbound_nonreg_file = 1,
+ .pollout = 1,
+ .audit_skip = 1,
+ },
+ [IORING_OP_RECVFROM] = {
+ .needs_file = 1,
+ .unbound_nonreg_file = 1,
+ .pollin = 1,
+ .buffer_select = 1,
+ .audit_skip = 1,
+ },
};

/* requests with any of those set should undergo io_disarm_next() */
@@ -5216,12 +5237,24 @@ static int io_sendmsg_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
return -EINVAL;

+ /*
+ * For IORING_OP_SEND{,TO}, the assignment to @sr->umsg
+ * is equivalent to an assignment to @sr->buf.
+ */
sr->umsg = u64_to_user_ptr(READ_ONCE(sqe->addr));
+
sr->len = READ_ONCE(sqe->len);
sr->msg_flags = READ_ONCE(sqe->msg_flags) | MSG_NOSIGNAL;
if (sr->msg_flags & MSG_DONTWAIT)
req->flags |= REQ_F_NOWAIT;

+ if (req->opcode == IORING_OP_SENDTO) {
+ sr->addr = u64_to_user_ptr(READ_ONCE(sqe->addr2));
+ sr->sendto_addr_len = READ_ONCE(sqe->addr3);
+ } else {
+ sr->addr = (struct sockaddr __user *) NULL;
+ }
+
#ifdef CONFIG_COMPAT
if (req->ctx->compat)
sr->msg_flags |= MSG_CMSG_COMPAT;
@@ -5275,6 +5308,7 @@ static int io_sendmsg(struct io_kiocb *req, unsigned int issue_flags)

static int io_sendto(struct io_kiocb *req, unsigned int issue_flags)
{
+ struct sockaddr_storage address;
struct io_sr_msg *sr = &req->sr_msg;
struct msghdr msg;
struct iovec iov;
@@ -5291,10 +5325,20 @@ static int io_sendto(struct io_kiocb *req, unsigned int issue_flags)
if (unlikely(ret))
return ret;

- msg.msg_name = NULL;
+
msg.msg_control = NULL;
msg.msg_controllen = 0;
- msg.msg_namelen = 0;
+ if (sr->addr) {
+ ret = move_addr_to_kernel(sr->addr, sr->sendto_addr_len,
+ &address);
+ if (unlikely(ret < 0))
+ goto fail;
+ msg.msg_name = (struct sockaddr *) &address;
+ msg.msg_namelen = sr->sendto_addr_len;
+ } else {
+ msg.msg_name = NULL;
+ msg.msg_namelen = 0;
+ }

flags = req->sr_msg.msg_flags;
if (issue_flags & IO_URING_F_NONBLOCK)
@@ -5309,6 +5353,7 @@ static int io_sendto(struct io_kiocb *req, unsigned int issue_flags)
return -EAGAIN;
if (ret == -ERESTARTSYS)
ret = -EINTR;
+ fail:
req_set_fail(req);
}
__io_req_complete(req, issue_flags, ret, 0);
@@ -5427,13 +5472,25 @@ static int io_recvmsg_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
return -EINVAL;

+ /*
+ * For IORING_OP_RECV{,FROM}, the assignment to @sr->umsg
+ * is equivalent to an assignment to @sr->buf.
+ */
sr->umsg = u64_to_user_ptr(READ_ONCE(sqe->addr));
+
sr->len = READ_ONCE(sqe->len);
sr->bgid = READ_ONCE(sqe->buf_group);
sr->msg_flags = READ_ONCE(sqe->msg_flags) | MSG_NOSIGNAL;
if (sr->msg_flags & MSG_DONTWAIT)
req->flags |= REQ_F_NOWAIT;

+ if (req->opcode == IORING_OP_RECVFROM) {
+ sr->addr = u64_to_user_ptr(READ_ONCE(sqe->addr2));
+ sr->recvfrom_addr_len = u64_to_user_ptr(READ_ONCE(sqe->addr3));
+ } else {
+ sr->addr = (struct sockaddr __user *) NULL;
+ }
+
#ifdef CONFIG_COMPAT
if (req->ctx->compat)
sr->msg_flags |= MSG_CMSG_COMPAT;
@@ -5509,6 +5566,7 @@ static int io_recvfrom(struct io_kiocb *req, unsigned int issue_flags)
struct iovec iov;
unsigned flags;
int ret, min_ret = 0;
+ struct sockaddr_storage address;
bool force_nonblock = issue_flags & IO_URING_F_NONBLOCK;

sock = sock_from_file(req->file);
@@ -5526,7 +5584,7 @@ static int io_recvfrom(struct io_kiocb *req, unsigned int issue_flags)
if (unlikely(ret))
goto out_free;

- msg.msg_name = NULL;
+ msg.msg_name = sr->addr ? (struct sockaddr *) &address : NULL;
msg.msg_control = NULL;
msg.msg_controllen = 0;
msg.msg_namelen = 0;
@@ -5540,6 +5598,16 @@ static int io_recvfrom(struct io_kiocb *req, unsigned int issue_flags)
min_ret = iov_iter_count(&msg.msg_iter);

ret = sock_recvmsg(sock, &msg, flags);
+
+ if (ret >= 0 && sr->addr != NULL) {
+ int tmp;
+
+ tmp = move_addr_to_user(&address, msg.msg_namelen, sr->addr,
+ sr->recvfrom_addr_len);
+ if (unlikely(tmp < 0))
+ ret = tmp;
+ }
+
out_free:
if (ret < min_ret) {
if (ret == -EAGAIN && force_nonblock)
@@ -5707,8 +5775,8 @@ IO_NETOP_PREP_ASYNC(sendmsg);
IO_NETOP_PREP_ASYNC(recvmsg);
IO_NETOP_PREP_ASYNC(connect);
IO_NETOP_PREP(accept);
-IO_NETOP_FN(send);
-IO_NETOP_FN(recv);
+IO_NETOP_FN(sendto);
+IO_NETOP_FN(recvfrom);
#endif /* CONFIG_NET */

struct io_poll_table {
@@ -6778,9 +6846,11 @@ static int io_req_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
case IORING_OP_SYNC_FILE_RANGE:
return io_sfr_prep(req, sqe);
case IORING_OP_SENDMSG:
+ case IORING_OP_SENDTO:
case IORING_OP_SEND:
return io_sendmsg_prep(req, sqe);
case IORING_OP_RECVMSG:
+ case IORING_OP_RECVFROM:
case IORING_OP_RECV:
return io_recvmsg_prep(req, sqe);
case IORING_OP_CONNECT:
@@ -7060,12 +7130,14 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags)
case IORING_OP_SENDMSG:
ret = io_sendmsg(req, issue_flags);
break;
+ case IORING_OP_SENDTO:
case IORING_OP_SEND:
ret = io_sendto(req, issue_flags);
break;
case IORING_OP_RECVMSG:
ret = io_recvmsg(req, issue_flags);
break;
+ case IORING_OP_RECVFROM:
case IORING_OP_RECV:
ret = io_recvfrom(req, issue_flags);
break;
diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index efc7ac9b3a6b..a360069d1e8e 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -150,6 +150,8 @@ enum {
IORING_OP_SETXATTR,
IORING_OP_FGETXATTR,
IORING_OP_GETXATTR,
+ IORING_OP_SENDTO,
+ IORING_OP_RECVFROM,

/* this goes last, obviously */
IORING_OP_LAST,
--
2.32.0


2021-12-30 17:52:43

by Ammar Faizi

[permalink] [raw]
Subject: [RFC PATCH v3 0/3] io_uring: Add sendto(2) and recvfrom(2) support

Hello,

This RFC patchset adds sendto(2) and recvfrom(2) support for io_uring.
It also addresses an issue in the liburing GitHub repository [1].


## Motivations:
1) By using `sendto()` and `recvfrom()` we can make the submission
simpler compared to always using `sendmsg()` and `recvmsg()` from
the userspace.

2) There is a historical patch that tried to add the same
functionality, but did not end up being applied. [2]

On Tue, 7 Jul 2020 12:29:18 -0600, Jens Axboe <[email protected]> wrote:
> In a private conversation with the author, a good point was brought
> up that the sendto/recvfrom do not require an allocation of an async
> context, if we need to defer or go async with the request. I think
> that's a major win, to be honest. There are other benefits as well
> (like shorter path), but to me, the async less part is nice and will
> reduce overhead


## Changes summary
There are 3 patches in this series.

PATCH 1/3 renames io_recv to io_recvfrom and io_send to io_sendto.
Note that

send(sockfd, buf, len, flags);

is equivalent to

sendto(sockfd, buf, len, flags, NULL, 0);

and
recv(sockfd, buf, len, flags);

is equivalent to

recvfrom(sockfd, buf, len, flags, NULL, NULL);

So it is saner to have `send` and `recv` directed to `sendto` and
`recvfrom` instead of the opposite with respect to the name.


PATCH 2/3 makes `move_addr_to_user()` be a non static function. This
function lives in net/socket.c, we need to call this from io_uring
to add `recvfrom()` support for liburing. Added net files maintainers
to the CC list.

PATCH 3/3 adds `sendto(2)` and `recvfrom(2)` support for io_uring.
Added two new opcodes: IORING_OP_SENDTO and IORING_OP_RECVFROM.


## How to test

This patchset is based on "for-next" branch commit:

bb3294e22482db4b7ec7cfbb2d0f5b53c1adcf86 ("Merge branch 'for-5.17/drivers' into for-next")

It is also available in the Git repository at:

https://github.com/ammarfaizi2/linux-block ammarfaizi2/linux-block/io_uring-recvfrom-sendto


I also added the liburing support and test. The liburing support is
based on "xattr-getdents64" branch commit:

55a9bf979f27f3a5c9f456f26dcfe16c4791667b ("src/include/liburing.h: style cleanups")

It is available in the Git repository at:

https://github.com/ammarfaizi2/liburing sendto-recvfrom

---
v3:
- Fix build error when CONFIG_NET is undefined for PATCH 1/3. I
tried to fix it in PATCH 3/3, but it should be fixed in PATCH 1/3,
otherwise it breaks the build in PATCH 1/3.

- Added `io_uring_prep_{sendto,recvfrom}` docs to the liburing.

v2:
- Rebased the work, now this patchset is based on commit
bb3294e22482db4b7ec ("Merge branch 'for-5.17/drivers' into
for-next").

- In `io_recvfrom()`, mark the error check of `move_addr_to_user()`
call as unlikely.

- Fix build error when CONFIG_NET is undefined.

- Update liburing test (the branch is still the same, just force
pushed).

- Added Nugra to CC list (tester).

---
RFC v2: https://lore.kernel.org/io-uring/[email protected]/
RFC v1: https://lore.kernel.org/io-uring/[email protected]/
Link: https://github.com/axboe/liburing/issues/397 [1]
Link: https://lore.kernel.org/io-uring/[email protected]/ [2]
Cc: "David S. Miller" <[email protected]>
Cc: Jakub Kicinski <[email protected]>
Cc: [email protected]
Cc: Nugra <[email protected]>
Signed-off-by: Ammar Faizi <[email protected]>
---

Ammar Faizi (3):
io_uring: Rename `io_{send,recv}` to `io_{sendto,recvfrom}`
net: Make `move_addr_to_user()` be a non static function
io_uring: Add `sendto(2)` and `recvfrom(2)` support

fs/io_uring.c | 92 +++++++++++++++++++++++++++++++----
include/linux/socket.h | 2 +
include/uapi/linux/io_uring.h | 2 +
net/socket.c | 4 +-
4 files changed, 88 insertions(+), 12 deletions(-)


base-commit: bb3294e22482db4b7ec7cfbb2d0f5b53c1adcf86
--
Ammar Faizi


2021-12-30 17:52:47

by Ammar Faizi

[permalink] [raw]
Subject: [RFC PATCH v3 1/3] io_uring: Rename `io_{send,recv}` to `io_{sendto,recvfrom}`

Currently we can perform `send` and `recv` via io_uring. And now, we
are going to add `sendto` and `recvfrom` support for io_uring.

Note that:
Calling `send(fd, buf, len, flags)` is equivalent to calling
`sendto(fd, buf, len, flags, NULL, 0)`. Therefore, `sendto`
is a superset of `send`.

Calling `recv(fd, buf, len, flags)` is equivalent to calling
`recvfrom(fd, buf, len, flags, NULL, NULL)`. Therefore, `recvfrom`
is a superset of `recv`.

As such, let's direct the current supported `IORING_OP_{SEND,RECV}` to
`io_{sendto,recvfrom}`. These functions will also be used for
`IORING_OP_{SENDTO,RECVFROM}` operation in the next patches.

Cc: Nugra <[email protected]>
Signed-off-by: Ammar Faizi <[email protected]>
---

v3:
- Fix build error when CONFIG_NET is undefined for PATCH 1/3. I
tried to fix it in PATCH 3/3, but it should be fixed in PATCH 1/3,
otherwise it breaks the build in PATCH 1/3.

v2:
- Added Nugra to CC list (tester).
---
fs/io_uring.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 90002bb3fdf4..7adcb591398f 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -5273,7 +5273,7 @@ static int io_sendmsg(struct io_kiocb *req, unsigned int issue_flags)
return 0;
}

-static int io_send(struct io_kiocb *req, unsigned int issue_flags)
+static int io_sendto(struct io_kiocb *req, unsigned int issue_flags)
{
struct io_sr_msg *sr = &req->sr_msg;
struct msghdr msg;
@@ -5499,7 +5499,7 @@ static int io_recvmsg(struct io_kiocb *req, unsigned int issue_flags)
return 0;
}

-static int io_recv(struct io_kiocb *req, unsigned int issue_flags)
+static int io_recvfrom(struct io_kiocb *req, unsigned int issue_flags)
{
struct io_buffer *kbuf;
struct io_sr_msg *sr = &req->sr_msg;
@@ -5707,8 +5707,8 @@ IO_NETOP_PREP_ASYNC(sendmsg);
IO_NETOP_PREP_ASYNC(recvmsg);
IO_NETOP_PREP_ASYNC(connect);
IO_NETOP_PREP(accept);
-IO_NETOP_FN(send);
-IO_NETOP_FN(recv);
+IO_NETOP_FN(sendto);
+IO_NETOP_FN(recvfrom);
#endif /* CONFIG_NET */

struct io_poll_table {
@@ -7061,13 +7061,13 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags)
ret = io_sendmsg(req, issue_flags);
break;
case IORING_OP_SEND:
- ret = io_send(req, issue_flags);
+ ret = io_sendto(req, issue_flags);
break;
case IORING_OP_RECVMSG:
ret = io_recvmsg(req, issue_flags);
break;
case IORING_OP_RECV:
- ret = io_recv(req, issue_flags);
+ ret = io_recvfrom(req, issue_flags);
break;
case IORING_OP_TIMEOUT:
ret = io_timeout(req, issue_flags);
--
2.32.0


2021-12-30 17:52:50

by Ammar Faizi

[permalink] [raw]
Subject: [RFC PATCH v3 2/3] net: Make `move_addr_to_user()` be a non static function

In order to add recvfrom support for io_uring, we need to call
`move_addr_to_user()` in fs/io_uring.c.

This makes `move_addr_to_user()` be a non static function so we can
call it from io_uring.

Cc: "David S. Miller" <[email protected]>
Cc: Jakub Kicinski <[email protected]>
Cc: [email protected]
Cc: Nugra <[email protected]>
Signed-off-by: Ammar Faizi <[email protected]>
---

v3:
* No changes *

v2:
- Added Nugra to CC list (tester).
---
include/linux/socket.h | 2 ++
net/socket.c | 4 ++--
2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/include/linux/socket.h b/include/linux/socket.h
index 8ef26d89ef49..0d0bc1ace50c 100644
--- a/include/linux/socket.h
+++ b/include/linux/socket.h
@@ -371,6 +371,8 @@ struct ucred {
#define IPX_TYPE 1

extern int move_addr_to_kernel(void __user *uaddr, int ulen, struct sockaddr_storage *kaddr);
+extern int move_addr_to_user(struct sockaddr_storage *kaddr, int klen,
+ void __user *uaddr, int __user *ulen);
extern int put_cmsg(struct msghdr*, int level, int type, int len, void *data);

struct timespec64;
diff --git a/net/socket.c b/net/socket.c
index 7f64a6eccf63..af521d351c8a 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -267,8 +267,8 @@ int move_addr_to_kernel(void __user *uaddr, int ulen, struct sockaddr_storage *k
* specified. Zero is returned for a success.
*/

-static int move_addr_to_user(struct sockaddr_storage *kaddr, int klen,
- void __user *uaddr, int __user *ulen)
+int move_addr_to_user(struct sockaddr_storage *kaddr, int klen,
+ void __user *uaddr, int __user *ulen)
{
int err;
int len;
--
2.32.0


2021-12-30 17:52:54

by Ammar Faizi

[permalink] [raw]
Subject: [RFC PATCH v3 3/3] io_uring: Add `sendto(2)` and `recvfrom(2)` support

This adds sendto(2) and recvfrom(2) support for io_uring.

New opcodes:
IORING_OP_SENDTO
IORING_OP_RECVFROM

Cc: Nugra <[email protected]>
Tested-by: Nugra <[email protected]>
Link: https://github.com/axboe/liburing/issues/397
Signed-off-by: Ammar Faizi <[email protected]>
---

v3:
- Fix build error when CONFIG_NET is undefined should be done in
the first patch, not this patch.

- Add Tested-by tag from Nugra.

v2:
- In `io_recvfrom()`, mark the error check of `move_addr_to_user()`
call as unlikely.

- Fix build error when CONFIG_NET is undefined.

- Added Nugra to CC list (tester).
---
fs/io_uring.c | 80 +++++++++++++++++++++++++++++++++--
include/uapi/linux/io_uring.h | 2 +
2 files changed, 78 insertions(+), 4 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 7adcb591398f..3726958f8f58 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -575,7 +575,15 @@ struct io_sr_msg {
union {
struct compat_msghdr __user *umsg_compat;
struct user_msghdr __user *umsg;
- void __user *buf;
+
+ struct {
+ void __user *buf;
+ struct sockaddr __user *addr;
+ union {
+ int sendto_addr_len;
+ int __user *recvfrom_addr_len;
+ };
+ };
};
int msg_flags;
int bgid;
@@ -1133,6 +1141,19 @@ static const struct io_op_def io_op_defs[] = {
.needs_file = 1
},
[IORING_OP_GETXATTR] = {},
+ [IORING_OP_SENDTO] = {
+ .needs_file = 1,
+ .unbound_nonreg_file = 1,
+ .pollout = 1,
+ .audit_skip = 1,
+ },
+ [IORING_OP_RECVFROM] = {
+ .needs_file = 1,
+ .unbound_nonreg_file = 1,
+ .pollin = 1,
+ .buffer_select = 1,
+ .audit_skip = 1,
+ },
};

/* requests with any of those set should undergo io_disarm_next() */
@@ -5216,12 +5237,24 @@ static int io_sendmsg_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
return -EINVAL;

+ /*
+ * For IORING_OP_SEND{,TO}, the assignment to @sr->umsg
+ * is equivalent to an assignment to @sr->buf.
+ */
sr->umsg = u64_to_user_ptr(READ_ONCE(sqe->addr));
+
sr->len = READ_ONCE(sqe->len);
sr->msg_flags = READ_ONCE(sqe->msg_flags) | MSG_NOSIGNAL;
if (sr->msg_flags & MSG_DONTWAIT)
req->flags |= REQ_F_NOWAIT;

+ if (req->opcode == IORING_OP_SENDTO) {
+ sr->addr = u64_to_user_ptr(READ_ONCE(sqe->addr2));
+ sr->sendto_addr_len = READ_ONCE(sqe->addr3);
+ } else {
+ sr->addr = (struct sockaddr __user *) NULL;
+ }
+
#ifdef CONFIG_COMPAT
if (req->ctx->compat)
sr->msg_flags |= MSG_CMSG_COMPAT;
@@ -5275,6 +5308,7 @@ static int io_sendmsg(struct io_kiocb *req, unsigned int issue_flags)

static int io_sendto(struct io_kiocb *req, unsigned int issue_flags)
{
+ struct sockaddr_storage address;
struct io_sr_msg *sr = &req->sr_msg;
struct msghdr msg;
struct iovec iov;
@@ -5291,10 +5325,20 @@ static int io_sendto(struct io_kiocb *req, unsigned int issue_flags)
if (unlikely(ret))
return ret;

- msg.msg_name = NULL;
+
msg.msg_control = NULL;
msg.msg_controllen = 0;
- msg.msg_namelen = 0;
+ if (sr->addr) {
+ ret = move_addr_to_kernel(sr->addr, sr->sendto_addr_len,
+ &address);
+ if (unlikely(ret < 0))
+ goto fail;
+ msg.msg_name = (struct sockaddr *) &address;
+ msg.msg_namelen = sr->sendto_addr_len;
+ } else {
+ msg.msg_name = NULL;
+ msg.msg_namelen = 0;
+ }

flags = req->sr_msg.msg_flags;
if (issue_flags & IO_URING_F_NONBLOCK)
@@ -5309,6 +5353,7 @@ static int io_sendto(struct io_kiocb *req, unsigned int issue_flags)
return -EAGAIN;
if (ret == -ERESTARTSYS)
ret = -EINTR;
+ fail:
req_set_fail(req);
}
__io_req_complete(req, issue_flags, ret, 0);
@@ -5427,13 +5472,25 @@ static int io_recvmsg_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
return -EINVAL;

+ /*
+ * For IORING_OP_RECV{,FROM}, the assignment to @sr->umsg
+ * is equivalent to an assignment to @sr->buf.
+ */
sr->umsg = u64_to_user_ptr(READ_ONCE(sqe->addr));
+
sr->len = READ_ONCE(sqe->len);
sr->bgid = READ_ONCE(sqe->buf_group);
sr->msg_flags = READ_ONCE(sqe->msg_flags) | MSG_NOSIGNAL;
if (sr->msg_flags & MSG_DONTWAIT)
req->flags |= REQ_F_NOWAIT;

+ if (req->opcode == IORING_OP_RECVFROM) {
+ sr->addr = u64_to_user_ptr(READ_ONCE(sqe->addr2));
+ sr->recvfrom_addr_len = u64_to_user_ptr(READ_ONCE(sqe->addr3));
+ } else {
+ sr->addr = (struct sockaddr __user *) NULL;
+ }
+
#ifdef CONFIG_COMPAT
if (req->ctx->compat)
sr->msg_flags |= MSG_CMSG_COMPAT;
@@ -5509,6 +5566,7 @@ static int io_recvfrom(struct io_kiocb *req, unsigned int issue_flags)
struct iovec iov;
unsigned flags;
int ret, min_ret = 0;
+ struct sockaddr_storage address;
bool force_nonblock = issue_flags & IO_URING_F_NONBLOCK;

sock = sock_from_file(req->file);
@@ -5526,7 +5584,7 @@ static int io_recvfrom(struct io_kiocb *req, unsigned int issue_flags)
if (unlikely(ret))
goto out_free;

- msg.msg_name = NULL;
+ msg.msg_name = sr->addr ? (struct sockaddr *) &address : NULL;
msg.msg_control = NULL;
msg.msg_controllen = 0;
msg.msg_namelen = 0;
@@ -5540,6 +5598,16 @@ static int io_recvfrom(struct io_kiocb *req, unsigned int issue_flags)
min_ret = iov_iter_count(&msg.msg_iter);

ret = sock_recvmsg(sock, &msg, flags);
+
+ if (ret >= 0 && sr->addr != NULL) {
+ int tmp;
+
+ tmp = move_addr_to_user(&address, msg.msg_namelen, sr->addr,
+ sr->recvfrom_addr_len);
+ if (unlikely(tmp < 0))
+ ret = tmp;
+ }
+
out_free:
if (ret < min_ret) {
if (ret == -EAGAIN && force_nonblock)
@@ -6778,9 +6846,11 @@ static int io_req_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
case IORING_OP_SYNC_FILE_RANGE:
return io_sfr_prep(req, sqe);
case IORING_OP_SENDMSG:
+ case IORING_OP_SENDTO:
case IORING_OP_SEND:
return io_sendmsg_prep(req, sqe);
case IORING_OP_RECVMSG:
+ case IORING_OP_RECVFROM:
case IORING_OP_RECV:
return io_recvmsg_prep(req, sqe);
case IORING_OP_CONNECT:
@@ -7060,12 +7130,14 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags)
case IORING_OP_SENDMSG:
ret = io_sendmsg(req, issue_flags);
break;
+ case IORING_OP_SENDTO:
case IORING_OP_SEND:
ret = io_sendto(req, issue_flags);
break;
case IORING_OP_RECVMSG:
ret = io_recvmsg(req, issue_flags);
break;
+ case IORING_OP_RECVFROM:
case IORING_OP_RECV:
ret = io_recvfrom(req, issue_flags);
break;
diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index efc7ac9b3a6b..a360069d1e8e 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -150,6 +150,8 @@ enum {
IORING_OP_SETXATTR,
IORING_OP_FGETXATTR,
IORING_OP_GETXATTR,
+ IORING_OP_SENDTO,
+ IORING_OP_RECVFROM,

/* this goes last, obviously */
IORING_OP_LAST,
--
2.32.0


2022-01-06 17:32:11

by Praveen Kumar

[permalink] [raw]
Subject: Re: [RFC PATCH v3 3/3] io_uring: Add `sendto(2)` and `recvfrom(2)` support

On 30-12-2021 23:22, Ammar Faizi wrote:
> This adds sendto(2) and recvfrom(2) support for io_uring.
>
> New opcodes:
> IORING_OP_SENDTO
> IORING_OP_RECVFROM
>
> Cc: Nugra <[email protected]>
> Tested-by: Nugra <[email protected]>
> Link: https://github.com/axboe/liburing/issues/397
> Signed-off-by: Ammar Faizi <[email protected]>
> ---
>
> v3:
> - Fix build error when CONFIG_NET is undefined should be done in
> the first patch, not this patch.
>
> - Add Tested-by tag from Nugra.
>
> v2:
> - In `io_recvfrom()`, mark the error check of `move_addr_to_user()`
> call as unlikely.
>
> - Fix build error when CONFIG_NET is undefined.
>
> - Added Nugra to CC list (tester).
> ---
> fs/io_uring.c | 80 +++++++++++++++++++++++++++++++++--
> include/uapi/linux/io_uring.h | 2 +
> 2 files changed, 78 insertions(+), 4 deletions(-)
>
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index 7adcb591398f..3726958f8f58 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -575,7 +575,15 @@ struct io_sr_msg {
> union {
> struct compat_msghdr __user *umsg_compat;
> struct user_msghdr __user *umsg;
> - void __user *buf;
> +
> + struct {
> + void __user *buf;
> + struct sockaddr __user *addr;
> + union {
> + int sendto_addr_len;
> + int __user *recvfrom_addr_len;
> + };
> + };
> };
> int msg_flags;
> int bgid;
> @@ -1133,6 +1141,19 @@ static const struct io_op_def io_op_defs[] = {
> .needs_file = 1
> },
> [IORING_OP_GETXATTR] = {},
> + [IORING_OP_SENDTO] = {
> + .needs_file = 1,
> + .unbound_nonreg_file = 1,
> + .pollout = 1,
> + .audit_skip = 1,
> + },
> + [IORING_OP_RECVFROM] = {
> + .needs_file = 1,
> + .unbound_nonreg_file = 1,
> + .pollin = 1,
> + .buffer_select = 1,
> + .audit_skip = 1,
> + },
> };
>
> /* requests with any of those set should undergo io_disarm_next() */
> @@ -5216,12 +5237,24 @@ static int io_sendmsg_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
> if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
> return -EINVAL;
>
> + /*
> + * For IORING_OP_SEND{,TO}, the assignment to @sr->umsg
> + * is equivalent to an assignment to @sr->buf.
> + */
> sr->umsg = u64_to_user_ptr(READ_ONCE(sqe->addr));
> +
> sr->len = READ_ONCE(sqe->len);
> sr->msg_flags = READ_ONCE(sqe->msg_flags) | MSG_NOSIGNAL;
> if (sr->msg_flags & MSG_DONTWAIT)
> req->flags |= REQ_F_NOWAIT;
>
> + if (req->opcode == IORING_OP_SENDTO) {
> + sr->addr = u64_to_user_ptr(READ_ONCE(sqe->addr2));
> + sr->sendto_addr_len = READ_ONCE(sqe->addr3);
> + } else {
> + sr->addr = (struct sockaddr __user *) NULL;

Let's have sendto_addr_len = 0

> + }
> +
> #ifdef CONFIG_COMPAT
> if (req->ctx->compat)
> sr->msg_flags |= MSG_CMSG_COMPAT;
> @@ -5275,6 +5308,7 @@ static int io_sendmsg(struct io_kiocb *req, unsigned int issue_flags)
>
> static int io_sendto(struct io_kiocb *req, unsigned int issue_flags)
> {
> + struct sockaddr_storage address;
> struct io_sr_msg *sr = &req->sr_msg;
> struct msghdr msg;
> struct iovec iov;
> @@ -5291,10 +5325,20 @@ static int io_sendto(struct io_kiocb *req, unsigned int issue_flags)
> if (unlikely(ret))
> return ret;
>
> - msg.msg_name = NULL;
> +
> msg.msg_control = NULL;
> msg.msg_controllen = 0;
> - msg.msg_namelen = 0;
> + if (sr->addr) {
> + ret = move_addr_to_kernel(sr->addr, sr->sendto_addr_len,
> + &address);
> + if (unlikely(ret < 0))
> + goto fail;
> + msg.msg_name = (struct sockaddr *) &address;
> + msg.msg_namelen = sr->sendto_addr_len;
> + } else {
> + msg.msg_name = NULL;
> + msg.msg_namelen = 0;
> + }
>
> flags = req->sr_msg.msg_flags;
> if (issue_flags & IO_URING_F_NONBLOCK)
> @@ -5309,6 +5353,7 @@ static int io_sendto(struct io_kiocb *req, unsigned int issue_flags)
> return -EAGAIN;
> if (ret == -ERESTARTSYS)
> ret = -EINTR;
> + fail:
> req_set_fail(req);

I think there is a problem with "fail" goto statement. Not getting full clarity on this change. With latest kernel, I see req_set_fail(req) inside if check, which I don't see here. Can you please resend the patch on latest kernel version. Thanks.

> }
> __io_req_complete(req, issue_flags, ret, 0);
> @@ -5427,13 +5472,25 @@ static int io_recvmsg_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
> if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
> return -EINVAL;
>
> + /*
> + * For IORING_OP_RECV{,FROM}, the assignment to @sr->umsg
> + * is equivalent to an assignment to @sr->buf.
> + */
> sr->umsg = u64_to_user_ptr(READ_ONCE(sqe->addr));
> +
> sr->len = READ_ONCE(sqe->len);
> sr->bgid = READ_ONCE(sqe->buf_group);
> sr->msg_flags = READ_ONCE(sqe->msg_flags) | MSG_NOSIGNAL;
> if (sr->msg_flags & MSG_DONTWAIT)
> req->flags |= REQ_F_NOWAIT;
>
> + if (req->opcode == IORING_OP_RECVFROM) {
> + sr->addr = u64_to_user_ptr(READ_ONCE(sqe->addr2));
> + sr->recvfrom_addr_len = u64_to_user_ptr(READ_ONCE(sqe->addr3));
> + } else {
> + sr->addr = (struct sockaddr __user *) NULL;

I think recvfrom_addr_len should also be pointed to NULL, instead of garbage for this case.

> + }
> +
> #ifdef CONFIG_COMPAT
> if (req->ctx->compat)
> sr->msg_flags |= MSG_CMSG_COMPAT;
> @@ -5509,6 +5566,7 @@ static int io_recvfrom(struct io_kiocb *req, unsigned int issue_flags)
> struct iovec iov;
> unsigned flags;
> int ret, min_ret = 0;
> + struct sockaddr_storage address;
> bool force_nonblock = issue_flags & IO_URING_F_NONBLOCK;
>
> sock = sock_from_file(req->file);
> @@ -5526,7 +5584,7 @@ static int io_recvfrom(struct io_kiocb *req, unsigned int issue_flags)
> if (unlikely(ret))
> goto out_free;
>
> - msg.msg_name = NULL;
> + msg.msg_name = sr->addr ? (struct sockaddr *) &address : NULL;
> msg.msg_control = NULL;
> msg.msg_controllen = 0;
> msg.msg_namelen = 0;

I think namelen should also be updated ?

> @@ -5540,6 +5598,16 @@ static int io_recvfrom(struct io_kiocb *req, unsigned int issue_flags)
> min_ret = iov_iter_count(&msg.msg_iter);
>
> ret = sock_recvmsg(sock, &msg, flags);
> +
> + if (ret >= 0 && sr->addr != NULL) {
> + int tmp;
> +
> + tmp = move_addr_to_user(&address, msg.msg_namelen, sr->addr,
> + sr->recvfrom_addr_len);
> + if (unlikely(tmp < 0))
> + ret = tmp;
> + }
> +
> out_free:
> if (ret < min_ret) {
> if (ret == -EAGAIN && force_nonblock)
> @@ -6778,9 +6846,11 @@ static int io_req_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
> case IORING_OP_SYNC_FILE_RANGE:
> return io_sfr_prep(req, sqe);
> case IORING_OP_SENDMSG:
> + case IORING_OP_SENDTO:
> case IORING_OP_SEND:
> return io_sendmsg_prep(req, sqe);
> case IORING_OP_RECVMSG:
> + case IORING_OP_RECVFROM:
> case IORING_OP_RECV:
> return io_recvmsg_prep(req, sqe);
> case IORING_OP_CONNECT:
> @@ -7060,12 +7130,14 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags)
> case IORING_OP_SENDMSG:
> ret = io_sendmsg(req, issue_flags);
> break;
> + case IORING_OP_SENDTO:
> case IORING_OP_SEND:
> ret = io_sendto(req, issue_flags);
> break;
> case IORING_OP_RECVMSG:
> ret = io_recvmsg(req, issue_flags);
> break;
> + case IORING_OP_RECVFROM:
> case IORING_OP_RECV:
> ret = io_recvfrom(req, issue_flags);
> break;
> diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
> index efc7ac9b3a6b..a360069d1e8e 100644
> --- a/include/uapi/linux/io_uring.h
> +++ b/include/uapi/linux/io_uring.h
> @@ -150,6 +150,8 @@ enum {
> IORING_OP_SETXATTR,
> IORING_OP_FGETXATTR,
> IORING_OP_GETXATTR,
> + IORING_OP_SENDTO,
> + IORING_OP_RECVFROM,
>
> /* this goes last, obviously */
> IORING_OP_LAST,


Regards,

~Praveen.

2022-01-06 20:47:15

by Ammar Faizi

[permalink] [raw]
Subject: Re: [RFC PATCH v3 3/3] io_uring: Add `sendto(2)` and `recvfrom(2)` support


On Thu, 6 Jan 2022 at 23:01:59 +0530, Praveen Kumar <[email protected]> wrote:
> On 30-12-2021 23:22, Ammar Faizi wrote:
>> This adds sendto(2) and recvfrom(2) support for io_uring.
>>
>> New opcodes:
>> IORING_OP_SENDTO
>> IORING_OP_RECVFROM
>>
>> Cc: Nugra <[email protected]>
>> Tested-by: Nugra <[email protected]>
>> Link: https://github.com/axboe/liburing/issues/397
>> Signed-off-by: Ammar Faizi <[email protected]>
>> ---
>>
>> v3:
>> - Fix build error when CONFIG_NET is undefined should be done in
>> the first patch, not this patch.
>>
>> - Add Tested-by tag from Nugra.
>>
>> v2:
>> - In `io_recvfrom()`, mark the error check of `move_addr_to_user()`
>> call as unlikely.
>>
>> - Fix build error when CONFIG_NET is undefined.
>>
>> - Added Nugra to CC list (tester).
>> ---
>> fs/io_uring.c | 80 +++++++++++++++++++++++++++++++++--
>> include/uapi/linux/io_uring.h | 2 +
>> 2 files changed, 78 insertions(+), 4 deletions(-)
>>
>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>> index 7adcb591398f..3726958f8f58 100644
>> --- a/fs/io_uring.c
>> +++ b/fs/io_uring.c
>> @@ -575,7 +575,15 @@ struct io_sr_msg {
>> union {
>> struct compat_msghdr __user *umsg_compat;
>> struct user_msghdr __user *umsg;
>> - void __user *buf;
>> +
>> + struct {
>> + void __user *buf;
>> + struct sockaddr __user *addr;
>> + union {
>> + int sendto_addr_len;
>> + int __user *recvfrom_addr_len;
>> + };
>> + };
>> };
>> int msg_flags;
>> int bgid;
>> @@ -1133,6 +1141,19 @@ static const struct io_op_def io_op_defs[] = {
>> .needs_file = 1
>> },
>> [IORING_OP_GETXATTR] = {},
>> + [IORING_OP_SENDTO] = {
>> + .needs_file = 1,
>> + .unbound_nonreg_file = 1,
>> + .pollout = 1,
>> + .audit_skip = 1,
>> + },
>> + [IORING_OP_RECVFROM] = {
>> + .needs_file = 1,
>> + .unbound_nonreg_file = 1,
>> + .pollin = 1,
>> + .buffer_select = 1,
>> + .audit_skip = 1,
>> + },
>> };
>>
>> /* requests with any of those set should undergo io_disarm_next() */
>> @@ -5216,12 +5237,24 @@ static int io_sendmsg_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
>> if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
>> return -EINVAL;
>>
>> + /*
>> + * For IORING_OP_SEND{,TO}, the assignment to @sr->umsg
>> + * is equivalent to an assignment to @sr->buf.
>> + */
>> sr->umsg = u64_to_user_ptr(READ_ONCE(sqe->addr));
>> +
>> sr->len = READ_ONCE(sqe->len);
>> sr->msg_flags = READ_ONCE(sqe->msg_flags) | MSG_NOSIGNAL;
>> if (sr->msg_flags & MSG_DONTWAIT)
>> req->flags |= REQ_F_NOWAIT;
>>
>> + if (req->opcode == IORING_OP_SENDTO) {
>> + sr->addr = u64_to_user_ptr(READ_ONCE(sqe->addr2));
>> + sr->sendto_addr_len = READ_ONCE(sqe->addr3);
>> + } else {
>> + sr->addr = (struct sockaddr __user *) NULL;
>
> Let's have sendto_addr_len = 0

Will do in the RFC v5.

>
>> + }
>> +
>> #ifdef CONFIG_COMPAT
>> if (req->ctx->compat)
>> sr->msg_flags |= MSG_CMSG_COMPAT;
>> @@ -5275,6 +5308,7 @@ static int io_sendmsg(struct io_kiocb *req, unsigned int issue_flags)
>>
>> static int io_sendto(struct io_kiocb *req, unsigned int issue_flags)
>> {
>> + struct sockaddr_storage address;
>> struct io_sr_msg *sr = &req->sr_msg;
>> struct msghdr msg;
>> struct iovec iov;
>> @@ -5291,10 +5325,20 @@ static int io_sendto(struct io_kiocb *req, unsigned int issue_flags)
>> if (unlikely(ret))
>> return ret;
>>
>> - msg.msg_name = NULL;
>> +
>> msg.msg_control = NULL;
>> msg.msg_controllen = 0;
>> - msg.msg_namelen = 0;
>> + if (sr->addr) {
>> + ret = move_addr_to_kernel(sr->addr, sr->sendto_addr_len,
>> + &address);
>> + if (unlikely(ret < 0))
>> + goto fail;
>> + msg.msg_name = (struct sockaddr *) &address;
>> + msg.msg_namelen = sr->sendto_addr_len;
>> + } else {
>> + msg.msg_name = NULL;
>> + msg.msg_namelen = 0;
>> + }
>>
>> flags = req->sr_msg.msg_flags;
>> if (issue_flags & IO_URING_F_NONBLOCK)
>> @@ -5309,6 +5353,7 @@ static int io_sendto(struct io_kiocb *req, unsigned int issue_flags)
>> return -EAGAIN;
>> if (ret == -ERESTARTSYS)
>> ret = -EINTR;
>> + fail:
>> req_set_fail(req);
>
> I think there is a problem with "fail" goto statement. Not getting
> full clarity on this change. With latest kernel, I see
> req_set_fail(req) inside if check, which I don't see here. Can you
> please resend the patch on latest kernel version. Thanks.

I will send the v5 on top of "for-next" branch in Jens' tree soon.

That is already inside an "if check" anyway. We go to that label when
the move_addr_to_kernel() fails (most of the time it is -EFAULT or
-EINVAL).

That part looks like this (note the if check before the goto):
----------------------------------------------------------------------
msg.msg_control = NULL;
msg.msg_controllen = 0;
if (sr->addr) {
ret = move_addr_to_kernel(sr->addr, sr->sendto_addr_len,
&address);
if (unlikely(ret < 0))
goto fail;
msg.msg_name = (struct sockaddr *) &address;
msg.msg_namelen = sr->sendto_addr_len;
} else {
msg.msg_name = NULL;
msg.msg_namelen = 0;
}

flags = req->sr_msg.msg_flags;
if (issue_flags & IO_URING_F_NONBLOCK)
flags |= MSG_DONTWAIT;
if (flags & MSG_WAITALL)
min_ret = iov_iter_count(&msg.msg_iter);

msg.msg_flags = flags;
ret = sock_sendmsg(sock, &msg);
if (ret < min_ret) {
if (ret == -EAGAIN && (issue_flags & IO_URING_F_NONBLOCK))
return -EAGAIN;
if (ret == -ERESTARTSYS)
ret = -EINTR;
fail:
req_set_fail(req);
}
__io_req_complete(req, issue_flags, ret, 0);
return 0;
----------------------------------------------------------------------

>> }
>> __io_req_complete(req, issue_flags, ret, 0);
>> @@ -5427,13 +5472,25 @@ static int io_recvmsg_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
>> if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
>> return -EINVAL;
>>
>> + /*
>> + * For IORING_OP_RECV{,FROM}, the assignment to @sr->umsg
>> + * is equivalent to an assignment to @sr->buf.
>> + */
>> sr->umsg = u64_to_user_ptr(READ_ONCE(sqe->addr));
>> +
>> sr->len = READ_ONCE(sqe->len);
>> sr->bgid = READ_ONCE(sqe->buf_group);
>> sr->msg_flags = READ_ONCE(sqe->msg_flags) | MSG_NOSIGNAL;
>> if (sr->msg_flags & MSG_DONTWAIT)
>> req->flags |= REQ_F_NOWAIT;
>>
>> + if (req->opcode == IORING_OP_RECVFROM) {
>> + sr->addr = u64_to_user_ptr(READ_ONCE(sqe->addr2));
>> + sr->recvfrom_addr_len = u64_to_user_ptr(READ_ONCE(sqe->addr3));
>> + } else {
>> + sr->addr = (struct sockaddr __user *) NULL;
>
> I think recvfrom_addr_len should also be pointed to NULL, instead of
> garbage for this case.

Will do in the RFC v5.

>
>> + }
>> +
>> #ifdef CONFIG_COMPAT
>> if (req->ctx->compat)
>> sr->msg_flags |= MSG_CMSG_COMPAT;
>> @@ -5509,6 +5566,7 @@ static int io_recvfrom(struct io_kiocb *req, unsigned int issue_flags)
>> struct iovec iov;
>> unsigned flags;
>> int ret, min_ret = 0;
>> + struct sockaddr_storage address;
>> bool force_nonblock = issue_flags & IO_URING_F_NONBLOCK;
>>
>> sock = sock_from_file(req->file);
>> @@ -5526,7 +5584,7 @@ static int io_recvfrom(struct io_kiocb *req, unsigned int issue_flags)
>> if (unlikely(ret))
>> goto out_free;
>>
>> - msg.msg_name = NULL;
>> + msg.msg_name = sr->addr ? (struct sockaddr *) &address : NULL;
>> msg.msg_control = NULL;
>> msg.msg_controllen = 0;
>> msg.msg_namelen = 0;
>
> I think namelen should also be updated ?

It doesn't have to be updated. From net/socket.c there is a comment
like this:

/* We assume all kernel code knows the size of sockaddr_storage */
msg.msg_namelen = 0;

Full __sys_recvfrom() source code, see here:
https://github.com/torvalds/linux/blob/v5.16-rc8/net/socket.c#L2085-L2088

I will add the same comment in next series to clarify this one.

>
>> @@ -5540,6 +5598,16 @@ static int io_recvfrom(struct io_kiocb *req, unsigned int issue_flags)
>> min_ret = iov_iter_count(&msg.msg_iter);
>>
>> ret = sock_recvmsg(sock, &msg, flags);
>> +
>> + if (ret >= 0 && sr->addr != NULL) {
>> + int tmp;
>> +
>> + tmp = move_addr_to_user(&address, msg.msg_namelen, sr->addr,
>> + sr->recvfrom_addr_len);
>> + if (unlikely(tmp < 0))
>> + ret = tmp;
>> + }
>> +
>> out_free:
>> if (ret < min_ret) {
>> if (ret == -EAGAIN && force_nonblock)
>> @@ -6778,9 +6846,11 @@ static int io_req_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
>> case IORING_OP_SYNC_FILE_RANGE:
>> return io_sfr_prep(req, sqe);
>> case IORING_OP_SENDMSG:
>> + case IORING_OP_SENDTO:
>> case IORING_OP_SEND:
>> return io_sendmsg_prep(req, sqe);
>> case IORING_OP_RECVMSG:
>> + case IORING_OP_RECVFROM:
>> case IORING_OP_RECV:
>> return io_recvmsg_prep(req, sqe);
>> case IORING_OP_CONNECT:
>> @@ -7060,12 +7130,14 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags)
>> case IORING_OP_SENDMSG:
>> ret = io_sendmsg(req, issue_flags);
>> break;
>> + case IORING_OP_SENDTO:
>> case IORING_OP_SEND:
>> ret = io_sendto(req, issue_flags);
>> break;
>> case IORING_OP_RECVMSG:
>> ret = io_recvmsg(req, issue_flags);
>> break;
>> + case IORING_OP_RECVFROM:
>> case IORING_OP_RECV:
>> ret = io_recvfrom(req, issue_flags);
>> break;
>> diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
>> index efc7ac9b3a6b..a360069d1e8e 100644
>> --- a/include/uapi/linux/io_uring.h
>> +++ b/include/uapi/linux/io_uring.h
>> @@ -150,6 +150,8 @@ enum {
>> IORING_OP_SETXATTR,
>> IORING_OP_FGETXATTR,
>> IORING_OP_GETXATTR,
>> + IORING_OP_SENDTO,
>> + IORING_OP_RECVFROM,
>>
>> /* this goes last, obviously */
>> IORING_OP_LAST,
>
>
> Regards,
>
> ~Praveen.
>

Thanks for the review. I will send the RFC v5 soon.

--
Ammar Faizi

2022-01-06 20:48:46

by Ammar Faizi

[permalink] [raw]
Subject: Re: [RFC PATCH v3 3/3] io_uring: Add `sendto(2)` and `recvfrom(2)` support

On Fri, 7 Jan 2022 at 03:38:50 +0700, Ammar Faizi <[email protected]> wrote:
> On Thu, 6 Jan 2022 at 23:01:59 +0530, Praveen Kumar <[email protected]> wrote:
>> On 30-12-2021 23:22, Ammar Faizi wrote:
>>> This adds sendto(2) and recvfrom(2) support for io_uring.
>>>
>>> New opcodes:
>>> IORING_OP_SENDTO
>>> IORING_OP_RECVFROM
>>>
>>> Cc: Nugra <[email protected]>
>>> Tested-by: Nugra <[email protected]>
>>> Link: https://github.com/axboe/liburing/issues/397
>>> Signed-off-by: Ammar Faizi <[email protected]>
>>> ---
>>>
>>> v3:
>>> - Fix build error when CONFIG_NET is undefined should be done in
>>> the first patch, not this patch.
>>>
>>> - Add Tested-by tag from Nugra.
>>>
>>> v2:
>>> - In `io_recvfrom()`, mark the error check of `move_addr_to_user()`
>>> call as unlikely.
>>>
>>> - Fix build error when CONFIG_NET is undefined.
>>>
>>> - Added Nugra to CC list (tester).
>>> ---
>>> fs/io_uring.c | 80 +++++++++++++++++++++++++++++++++--
>>> include/uapi/linux/io_uring.h | 2 +
>>> 2 files changed, 78 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>>> index 7adcb591398f..3726958f8f58 100644
>>> --- a/fs/io_uring.c
>>> +++ b/fs/io_uring.c
>>> @@ -575,7 +575,15 @@ struct io_sr_msg {
>>> union {
>>> struct compat_msghdr __user *umsg_compat;
>>> struct user_msghdr __user *umsg;
>>> - void __user *buf;
>>> +
>>> + struct {
>>> + void __user *buf;
>>> + struct sockaddr __user *addr;
>>> + union {
>>> + int sendto_addr_len;
>>> + int __user *recvfrom_addr_len;
>>> + };
>>> + };
>>> };
>>> int msg_flags;
>>> int bgid;
>>> @@ -1133,6 +1141,19 @@ static const struct io_op_def io_op_defs[] = {
>>> .needs_file = 1
>>> },
>>> [IORING_OP_GETXATTR] = {},
>>> + [IORING_OP_SENDTO] = {
>>> + .needs_file = 1,
>>> + .unbound_nonreg_file = 1,
>>> + .pollout = 1,
>>> + .audit_skip = 1,
>>> + },
>>> + [IORING_OP_RECVFROM] = {
>>> + .needs_file = 1,
>>> + .unbound_nonreg_file = 1,
>>> + .pollin = 1,
>>> + .buffer_select = 1,
>>> + .audit_skip = 1,
>>> + },
>>> };
>>>
>>> /* requests with any of those set should undergo io_disarm_next() */
>>> @@ -5216,12 +5237,24 @@ static int io_sendmsg_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
>>> if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
>>> return -EINVAL;
>>>
>>> + /*
>>> + * For IORING_OP_SEND{,TO}, the assignment to @sr->umsg
>>> + * is equivalent to an assignment to @sr->buf.
>>> + */
>>> sr->umsg = u64_to_user_ptr(READ_ONCE(sqe->addr));
>>> +
>>> sr->len = READ_ONCE(sqe->len);
>>> sr->msg_flags = READ_ONCE(sqe->msg_flags) | MSG_NOSIGNAL;
>>> if (sr->msg_flags & MSG_DONTWAIT)
>>> req->flags |= REQ_F_NOWAIT;
>>>
>>> + if (req->opcode == IORING_OP_SENDTO) {
>>> + sr->addr = u64_to_user_ptr(READ_ONCE(sqe->addr2));
>>> + sr->sendto_addr_len = READ_ONCE(sqe->addr3);
>>> + } else {
>>> + sr->addr = (struct sockaddr __user *) NULL;
>>
>> Let's have sendto_addr_len = 0
>
> Will do in the RFC v5.
>
>>
>>> + }
>>> +
>>> #ifdef CONFIG_COMPAT
>>> if (req->ctx->compat)
>>> sr->msg_flags |= MSG_CMSG_COMPAT;
>>> @@ -5275,6 +5308,7 @@ static int io_sendmsg(struct io_kiocb *req, unsigned int issue_flags)
>>>
>>> static int io_sendto(struct io_kiocb *req, unsigned int issue_flags)
>>> {
>>> + struct sockaddr_storage address;
>>> struct io_sr_msg *sr = &req->sr_msg;
>>> struct msghdr msg;
>>> struct iovec iov;
>>> @@ -5291,10 +5325,20 @@ static int io_sendto(struct io_kiocb *req, unsigned int issue_flags)
>>> if (unlikely(ret))
>>> return ret;
>>>
>>> - msg.msg_name = NULL;
>>> +
>>> msg.msg_control = NULL;
>>> msg.msg_controllen = 0;
>>> - msg.msg_namelen = 0;
>>> + if (sr->addr) {
>>> + ret = move_addr_to_kernel(sr->addr, sr->sendto_addr_len,
>>> + &address);
>>> + if (unlikely(ret < 0))
>>> + goto fail;
>>> + msg.msg_name = (struct sockaddr *) &address;
>>> + msg.msg_namelen = sr->sendto_addr_len;
>>> + } else {
>>> + msg.msg_name = NULL;
>>> + msg.msg_namelen = 0;
>>> + }
>>>
>>> flags = req->sr_msg.msg_flags;
>>> if (issue_flags & IO_URING_F_NONBLOCK)
>>> @@ -5309,6 +5353,7 @@ static int io_sendto(struct io_kiocb *req, unsigned int issue_flags)
>>> return -EAGAIN;
>>> if (ret == -ERESTARTSYS)
>>> ret = -EINTR;
>>> + fail:
>>> req_set_fail(req);
>>
>> I think there is a problem with "fail" goto statement. Not getting
>> full clarity on this change. With latest kernel, I see
>> req_set_fail(req) inside if check, which I don't see here. Can you
>> please resend the patch on latest kernel version. Thanks.
>
> I will send the v5 on top of "for-next" branch in Jens' tree soon.
>
> That is already inside an "if check" anyway. We go to that label when
> the move_addr_to_kernel() fails (most of the time it is -EFAULT or
> -EINVAL).
>
> That part looks like this (note the if check before the goto):
> ----------------------------------------------------------------------
> msg.msg_control = NULL;
> msg.msg_controllen = 0;
> if (sr->addr) {
> ret = move_addr_to_kernel(sr->addr, sr->sendto_addr_len,
> &address);
> if (unlikely(ret < 0))
> goto fail;
> msg.msg_name = (struct sockaddr *) &address;
> msg.msg_namelen = sr->sendto_addr_len;
> } else {
> msg.msg_name = NULL;
> msg.msg_namelen = 0;
> }
>
> flags = req->sr_msg.msg_flags;
> if (issue_flags & IO_URING_F_NONBLOCK)
> flags |= MSG_DONTWAIT;
> if (flags & MSG_WAITALL)
> min_ret = iov_iter_count(&msg.msg_iter);
>
> msg.msg_flags = flags;
> ret = sock_sendmsg(sock, &msg);
> if (ret < min_ret) {
> if (ret == -EAGAIN && (issue_flags & IO_URING_F_NONBLOCK))
> return -EAGAIN;
> if (ret == -ERESTARTSYS)
> ret = -EINTR;
> fail:
> req_set_fail(req);
> }
> __io_req_complete(req, issue_flags, ret, 0);
> return 0;
> ----------------------------------------------------------------------
>
>>> }
>>> __io_req_complete(req, issue_flags, ret, 0);
>>> @@ -5427,13 +5472,25 @@ static int io_recvmsg_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
>>> if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
>>> return -EINVAL;
>>>
>>> + /*
>>> + * For IORING_OP_RECV{,FROM}, the assignment to @sr->umsg
>>> + * is equivalent to an assignment to @sr->buf.
>>> + */
>>> sr->umsg = u64_to_user_ptr(READ_ONCE(sqe->addr));
>>> +
>>> sr->len = READ_ONCE(sqe->len);
>>> sr->bgid = READ_ONCE(sqe->buf_group);
>>> sr->msg_flags = READ_ONCE(sqe->msg_flags) | MSG_NOSIGNAL;
>>> if (sr->msg_flags & MSG_DONTWAIT)
>>> req->flags |= REQ_F_NOWAIT;
>>>
>>> + if (req->opcode == IORING_OP_RECVFROM) {
>>> + sr->addr = u64_to_user_ptr(READ_ONCE(sqe->addr2));
>>> + sr->recvfrom_addr_len = u64_to_user_ptr(READ_ONCE(sqe->addr3));
>>> + } else {
>>> + sr->addr = (struct sockaddr __user *) NULL;
>>
>> I think recvfrom_addr_len should also be pointed to NULL, instead of
>> garbage for this case.
>
> Will do in the RFC v5.
>
>>
>>> + }
>>> +
>>> #ifdef CONFIG_COMPAT
>>> if (req->ctx->compat)
>>> sr->msg_flags |= MSG_CMSG_COMPAT;
>>> @@ -5509,6 +5566,7 @@ static int io_recvfrom(struct io_kiocb *req, unsigned int issue_flags)
>>> struct iovec iov;
>>> unsigned flags;
>>> int ret, min_ret = 0;
>>> + struct sockaddr_storage address;
>>> bool force_nonblock = issue_flags & IO_URING_F_NONBLOCK;
>>>
>>> sock = sock_from_file(req->file);
>>> @@ -5526,7 +5584,7 @@ static int io_recvfrom(struct io_kiocb *req, unsigned int issue_flags)
>>> if (unlikely(ret))
>>> goto out_free;
>>>
>>> - msg.msg_name = NULL;
>>> + msg.msg_name = sr->addr ? (struct sockaddr *) &address : NULL;
>>> msg.msg_control = NULL;
>>> msg.msg_controllen = 0;
>>> msg.msg_namelen = 0;
>>
>> I think namelen should also be updated ?
>
> It doesn't have to be updated. From net/socket.c there is a comment
> like this:
>
> /* We assume all kernel code knows the size of sockaddr_storage */
> msg.msg_namelen = 0;
>
> Full __sys_recvfrom() source code, see here:
> https://github.com/torvalds/linux/blob/v5.16-rc8/net/socket.c#L2085-L2088
>
> I will add the same comment in next series to clarify this one.
>
>>
>>> @@ -5540,6 +5598,16 @@ static int io_recvfrom(struct io_kiocb *req, unsigned int issue_flags)
>>> min_ret = iov_iter_count(&msg.msg_iter);
>>>
>>> ret = sock_recvmsg(sock, &msg, flags);
>>> +
>>> + if (ret >= 0 && sr->addr != NULL) {
>>> + int tmp;
>>> +
>>> + tmp = move_addr_to_user(&address, msg.msg_namelen, sr->addr,
>>> + sr->recvfrom_addr_len);
>>> + if (unlikely(tmp < 0))
>>> + ret = tmp;
>>> + }
>>> +
>>> out_free:
>>> if (ret < min_ret) {
>>> if (ret == -EAGAIN && force_nonblock)
>>> @@ -6778,9 +6846,11 @@ static int io_req_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
>>> case IORING_OP_SYNC_FILE_RANGE:
>>> return io_sfr_prep(req, sqe);
>>> case IORING_OP_SENDMSG:
>>> + case IORING_OP_SENDTO:
>>> case IORING_OP_SEND:
>>> return io_sendmsg_prep(req, sqe);
>>> case IORING_OP_RECVMSG:
>>> + case IORING_OP_RECVFROM:
>>> case IORING_OP_RECV:
>>> return io_recvmsg_prep(req, sqe);
>>> case IORING_OP_CONNECT:
>>> @@ -7060,12 +7130,14 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags)
>>> case IORING_OP_SENDMSG:
>>> ret = io_sendmsg(req, issue_flags);
>>> break;
>>> + case IORING_OP_SENDTO:
>>> case IORING_OP_SEND:
>>> ret = io_sendto(req, issue_flags);
>>> break;
>>> case IORING_OP_RECVMSG:
>>> ret = io_recvmsg(req, issue_flags);
>>> break;
>>> + case IORING_OP_RECVFROM:
>>> case IORING_OP_RECV:
>>> ret = io_recvfrom(req, issue_flags);
>>> break;
>>> diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
>>> index efc7ac9b3a6b..a360069d1e8e 100644
>>> --- a/include/uapi/linux/io_uring.h
>>> +++ b/include/uapi/linux/io_uring.h
>>> @@ -150,6 +150,8 @@ enum {
>>> IORING_OP_SETXATTR,
>>> IORING_OP_FGETXATTR,
>>> IORING_OP_GETXATTR,
>>> + IORING_OP_SENDTO,
>>> + IORING_OP_RECVFROM,
>>>
>>> /* this goes last, obviously */
>>> IORING_OP_LAST,
>>
>>
>> Regards,
>>
>> ~Praveen.
>>
>
> Thanks for the review. I will send the RFC v5 soon.
>
> --
> Ammar Faizi

s/v5/v4/g

--
Ammar Faizi

2022-01-07 08:33:54

by Praveen Kumar

[permalink] [raw]
Subject: Re: [RFC PATCH v3 3/3] io_uring: Add `sendto(2)` and `recvfrom(2)` support

On 07-01-2022 02:08, Ammar Faizi wrote:
>
> On Thu, 6 Jan 2022 at 23:01:59 +0530, Praveen Kumar <[email protected]> wrote:
>> On 30-12-2021 23:22, Ammar Faizi wrote:
>>> This adds sendto(2) and recvfrom(2) support for io_uring.
>>>
>>> New opcodes:
>>> IORING_OP_SENDTO
>>> IORING_OP_RECVFROM
>>>
>>> Cc: Nugra <[email protected]>
>>> Tested-by: Nugra <[email protected]>
>>> Link: https://github.com/axboe/liburing/issues/397
>>> Signed-off-by: Ammar Faizi <[email protected]>
>>> ---
>>>
>>> v3:
>>> - Fix build error when CONFIG_NET is undefined should be done in
>>> the first patch, not this patch.
>>>
>>> - Add Tested-by tag from Nugra.
>>>
>>> v2:
>>> - In `io_recvfrom()`, mark the error check of `move_addr_to_user()`
>>> call as unlikely.
>>>
>>> - Fix build error when CONFIG_NET is undefined.
>>>
>>> - Added Nugra to CC list (tester).
>>> ---
>>> fs/io_uring.c | 80 +++++++++++++++++++++++++++++++++--
>>> include/uapi/linux/io_uring.h | 2 +
>>> 2 files changed, 78 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>>> index 7adcb591398f..3726958f8f58 100644
>>> --- a/fs/io_uring.c
>>> +++ b/fs/io_uring.c
>>> @@ -575,7 +575,15 @@ struct io_sr_msg {
>>> union {
>>> struct compat_msghdr __user *umsg_compat;
>>> struct user_msghdr __user *umsg;
>>> - void __user *buf;
>>> +
>>> + struct {
>>> + void __user *buf;
>>> + struct sockaddr __user *addr;
>>> + union {
>>> + int sendto_addr_len;
>>> + int __user *recvfrom_addr_len;
>>> + };
>>> + };
>>> };
>>> int msg_flags;
>>> int bgid;
>>> @@ -1133,6 +1141,19 @@ static const struct io_op_def io_op_defs[] = {
>>> .needs_file = 1
>>> },
>>> [IORING_OP_GETXATTR] = {},
>>> + [IORING_OP_SENDTO] = {
>>> + .needs_file = 1,
>>> + .unbound_nonreg_file = 1,
>>> + .pollout = 1,
>>> + .audit_skip = 1,
>>> + },
>>> + [IORING_OP_RECVFROM] = {
>>> + .needs_file = 1,
>>> + .unbound_nonreg_file = 1,
>>> + .pollin = 1,
>>> + .buffer_select = 1,
>>> + .audit_skip = 1,
>>> + },
>>> };
>>>
>>> /* requests with any of those set should undergo io_disarm_next() */
>>> @@ -5216,12 +5237,24 @@ static int io_sendmsg_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
>>> if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
>>> return -EINVAL;
>>>
>>> + /*
>>> + * For IORING_OP_SEND{,TO}, the assignment to @sr->umsg
>>> + * is equivalent to an assignment to @sr->buf.
>>> + */
>>> sr->umsg = u64_to_user_ptr(READ_ONCE(sqe->addr));
>>> +
>>> sr->len = READ_ONCE(sqe->len);
>>> sr->msg_flags = READ_ONCE(sqe->msg_flags) | MSG_NOSIGNAL;
>>> if (sr->msg_flags & MSG_DONTWAIT)
>>> req->flags |= REQ_F_NOWAIT;
>>>
>>> + if (req->opcode == IORING_OP_SENDTO) {
>>> + sr->addr = u64_to_user_ptr(READ_ONCE(sqe->addr2));
>>> + sr->sendto_addr_len = READ_ONCE(sqe->addr3);
>>> + } else {
>>> + sr->addr = (struct sockaddr __user *) NULL;
>>
>> Let's have sendto_addr_len = 0
>
> Will do in the RFC v5.
>
>>
>>> + }
>>> +
>>> #ifdef CONFIG_COMPAT
>>> if (req->ctx->compat)
>>> sr->msg_flags |= MSG_CMSG_COMPAT;
>>> @@ -5275,6 +5308,7 @@ static int io_sendmsg(struct io_kiocb *req, unsigned int issue_flags)
>>>
>>> static int io_sendto(struct io_kiocb *req, unsigned int issue_flags)
>>> {
>>> + struct sockaddr_storage address;
>>> struct io_sr_msg *sr = &req->sr_msg;
>>> struct msghdr msg;
>>> struct iovec iov;
>>> @@ -5291,10 +5325,20 @@ static int io_sendto(struct io_kiocb *req, unsigned int issue_flags)
>>> if (unlikely(ret))
>>> return ret;
>>>
>>> - msg.msg_name = NULL;
>>> +
>>> msg.msg_control = NULL;
>>> msg.msg_controllen = 0;
>>> - msg.msg_namelen = 0;
>>> + if (sr->addr) {
>>> + ret = move_addr_to_kernel(sr->addr, sr->sendto_addr_len,
>>> + &address);
>>> + if (unlikely(ret < 0))
>>> + goto fail;
>>> + msg.msg_name = (struct sockaddr *) &address;
>>> + msg.msg_namelen = sr->sendto_addr_len;
>>> + } else {
>>> + msg.msg_name = NULL;
>>> + msg.msg_namelen = 0;
>>> + }
>>>
>>> flags = req->sr_msg.msg_flags;
>>> if (issue_flags & IO_URING_F_NONBLOCK)
>>> @@ -5309,6 +5353,7 @@ static int io_sendto(struct io_kiocb *req, unsigned int issue_flags)
>>> return -EAGAIN;
>>> if (ret == -ERESTARTSYS)
>>> ret = -EINTR;
>>> + fail:
>>> req_set_fail(req);
>>
>> I think there is a problem with "fail" goto statement. Not getting
>> full clarity on this change. With latest kernel, I see
>> req_set_fail(req) inside if check, which I don't see here. Can you
>> please resend the patch on latest kernel version. Thanks.
>
> I will send the v5 on top of "for-next" branch in Jens' tree soon.
>
> That is already inside an "if check" anyway. We go to that label when
> the move_addr_to_kernel() fails (most of the time it is -EFAULT or
> -EINVAL).
>
> That part looks like this (note the if check before the goto):
> ----------------------------------------------------------------------
> msg.msg_control = NULL;
> msg.msg_controllen = 0;
> if (sr->addr) {
> ret = move_addr_to_kernel(sr->addr, sr->sendto_addr_len,
> &address);
> if (unlikely(ret < 0))
> goto fail;
> msg.msg_name = (struct sockaddr *) &address;
> msg.msg_namelen = sr->sendto_addr_len;
> } else {
> msg.msg_name = NULL;
> msg.msg_namelen = 0;
> }
>
> flags = req->sr_msg.msg_flags;
> if (issue_flags & IO_URING_F_NONBLOCK)
> flags |= MSG_DONTWAIT;
> if (flags & MSG_WAITALL)
> min_ret = iov_iter_count(&msg.msg_iter);
>
> msg.msg_flags = flags;
> ret = sock_sendmsg(sock, &msg);
> if (ret < min_ret) {
> if (ret == -EAGAIN && (issue_flags & IO_URING_F_NONBLOCK))
> return -EAGAIN;
> if (ret == -ERESTARTSYS)
> ret = -EINTR;
> fail:

Thanks for sending this. IMO this goto label should be just before the "if (ret < min_ret)" statement.

> req_set_fail(req);
> }
> __io_req_complete(req, issue_flags, ret, 0);
> return 0;
> ----------------------------------------------------------------------
>
>>> }
>>> __io_req_complete(req, issue_flags, ret, 0);
>>> @@ -5427,13 +5472,25 @@ static int io_recvmsg_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
>>> if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
>>> return -EINVAL;
>>>
>>> + /*
>>> + * For IORING_OP_RECV{,FROM}, the assignment to @sr->umsg
>>> + * is equivalent to an assignment to @sr->buf.
>>> + */
>>> sr->umsg = u64_to_user_ptr(READ_ONCE(sqe->addr));
>>> +
>>> sr->len = READ_ONCE(sqe->len);
>>> sr->bgid = READ_ONCE(sqe->buf_group);
>>> sr->msg_flags = READ_ONCE(sqe->msg_flags) | MSG_NOSIGNAL;
>>> if (sr->msg_flags & MSG_DONTWAIT)
>>> req->flags |= REQ_F_NOWAIT;
>>>
>>> + if (req->opcode == IORING_OP_RECVFROM) {
>>> + sr->addr = u64_to_user_ptr(READ_ONCE(sqe->addr2));
>>> + sr->recvfrom_addr_len = u64_to_user_ptr(READ_ONCE(sqe->addr3));
>>> + } else {
>>> + sr->addr = (struct sockaddr __user *) NULL;
>>
>> I think recvfrom_addr_len should also be pointed to NULL, instead of
>> garbage for this case.
>
> Will do in the RFC v5.
>
>>
>>> + }
>>> +
>>> #ifdef CONFIG_COMPAT
>>> if (req->ctx->compat)
>>> sr->msg_flags |= MSG_CMSG_COMPAT;
>>> @@ -5509,6 +5566,7 @@ static int io_recvfrom(struct io_kiocb *req, unsigned int issue_flags)
>>> struct iovec iov;
>>> unsigned flags;
>>> int ret, min_ret = 0;
>>> + struct sockaddr_storage address;
>>> bool force_nonblock = issue_flags & IO_URING_F_NONBLOCK;
>>>
>>> sock = sock_from_file(req->file);
>>> @@ -5526,7 +5584,7 @@ static int io_recvfrom(struct io_kiocb *req, unsigned int issue_flags)
>>> if (unlikely(ret))
>>> goto out_free;
>>>
>>> - msg.msg_name = NULL;
>>> + msg.msg_name = sr->addr ? (struct sockaddr *) &address : NULL;
>>> msg.msg_control = NULL;
>>> msg.msg_controllen = 0;
>>> msg.msg_namelen = 0;
>>
>> I think namelen should also be updated ?
>
> It doesn't have to be updated. From net/socket.c there is a comment
> like this:
>
> /* We assume all kernel code knows the size of sockaddr_storage */
> msg.msg_namelen = 0;
>
> Full __sys_recvfrom() source code, see here:
> https://github.com/torvalds/linux/blob/v5.16-rc8/net/socket.c#L2085-L2088
>
> I will add the same comment in next series to clarify this one.
>
>>
>>> @@ -5540,6 +5598,16 @@ static int io_recvfrom(struct io_kiocb *req, unsigned int issue_flags)
>>> min_ret = iov_iter_count(&msg.msg_iter);
>>>
>>> ret = sock_recvmsg(sock, &msg, flags);
>>> +
>>> + if (ret >= 0 && sr->addr != NULL) {
>>> + int tmp;
>>> +
>>> + tmp = move_addr_to_user(&address, msg.msg_namelen, sr->addr,
>>> + sr->recvfrom_addr_len);
>>> + if (unlikely(tmp < 0))
>>> + ret = tmp;
>>> + }
>>> +
>>> out_free:
>>> if (ret < min_ret) {
>>> if (ret == -EAGAIN && force_nonblock)
>>> @@ -6778,9 +6846,11 @@ static int io_req_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
>>> case IORING_OP_SYNC_FILE_RANGE:
>>> return io_sfr_prep(req, sqe);
>>> case IORING_OP_SENDMSG:
>>> + case IORING_OP_SENDTO:
>>> case IORING_OP_SEND:
>>> return io_sendmsg_prep(req, sqe);
>>> case IORING_OP_RECVMSG:
>>> + case IORING_OP_RECVFROM:
>>> case IORING_OP_RECV:
>>> return io_recvmsg_prep(req, sqe);
>>> case IORING_OP_CONNECT:
>>> @@ -7060,12 +7130,14 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags)
>>> case IORING_OP_SENDMSG:
>>> ret = io_sendmsg(req, issue_flags);
>>> break;
>>> + case IORING_OP_SENDTO:
>>> case IORING_OP_SEND:
>>> ret = io_sendto(req, issue_flags);
>>> break;
>>> case IORING_OP_RECVMSG:
>>> ret = io_recvmsg(req, issue_flags);
>>> break;
>>> + case IORING_OP_RECVFROM:
>>> case IORING_OP_RECV:
>>> ret = io_recvfrom(req, issue_flags);
>>> break;
>>> diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
>>> index efc7ac9b3a6b..a360069d1e8e 100644
>>> --- a/include/uapi/linux/io_uring.h
>>> +++ b/include/uapi/linux/io_uring.h
>>> @@ -150,6 +150,8 @@ enum {
>>> IORING_OP_SETXATTR,
>>> IORING_OP_FGETXATTR,
>>> IORING_OP_GETXATTR,
>>> + IORING_OP_SENDTO,
>>> + IORING_OP_RECVFROM,
>>>
>>> /* this goes last, obviously */
>>> IORING_OP_LAST,
>>
>>
>> Regards,
>>
>> ~Praveen.
>>
>
> Thanks for the review. I will send the RFC v5 soon.
>


2022-01-07 11:02:16

by Ammar Faizi

[permalink] [raw]
Subject: Re: [RFC PATCH v3 3/3] io_uring: Add `sendto(2)` and `recvfrom(2)` support

On 1/7/22 3:33 PM, Praveen Kumar wrote:
> On 07-01-2022 02:08, Ammar Faizi wrote:
>> On Thu, 6 Jan 2022 at 23:01:59 +0530, Praveen Kumar <[email protected]> wrote:
>>> On 30-12-2021 23:22, Ammar Faizi wrote:
>>>> This adds sendto(2) and recvfrom(2) support for io_uring.
>>>>
>>>> New opcodes:
>>>> IORING_OP_SENDTO
>>>> IORING_OP_RECVFROM
>>>>
>>>> Cc: Nugra <[email protected]>
>>>> Tested-by: Nugra <[email protected]>
>>>> Link: https://github.com/axboe/liburing/issues/397
>>>> Signed-off-by: Ammar Faizi <[email protected]>
>>>> ---
>>>>
>>>> v3:
>>>> - Fix build error when CONFIG_NET is undefined should be done in
>>>> the first patch, not this patch.
>>>>
>>>> - Add Tested-by tag from Nugra.
>>>>
>>>> v2:
>>>> - In `io_recvfrom()`, mark the error check of `move_addr_to_user()`
>>>> call as unlikely.
>>>>
>>>> - Fix build error when CONFIG_NET is undefined.
>>>>
>>>> - Added Nugra to CC list (tester).
>>>> ---
>>>> fs/io_uring.c | 80 +++++++++++++++++++++++++++++++++--
>>>> include/uapi/linux/io_uring.h | 2 +
>>>> 2 files changed, 78 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>>>> index 7adcb591398f..3726958f8f58 100644
>>>> --- a/fs/io_uring.c
>>>> +++ b/fs/io_uring.c
>>>> @@ -575,7 +575,15 @@ struct io_sr_msg {
>>>> union {
>>>> struct compat_msghdr __user *umsg_compat;
>>>> struct user_msghdr __user *umsg;
>>>> - void __user *buf;
>>>> +
>>>> + struct {
>>>> + void __user *buf;
>>>> + struct sockaddr __user *addr;
>>>> + union {
>>>> + int sendto_addr_len;
>>>> + int __user *recvfrom_addr_len;
>>>> + };
>>>> + };
>>>> };
>>>> int msg_flags;
>>>> int bgid;
>>>> @@ -1133,6 +1141,19 @@ static const struct io_op_def io_op_defs[] = {
>>>> .needs_file = 1
>>>> },
>>>> [IORING_OP_GETXATTR] = {},
>>>> + [IORING_OP_SENDTO] = {
>>>> + .needs_file = 1,
>>>> + .unbound_nonreg_file = 1,
>>>> + .pollout = 1,
>>>> + .audit_skip = 1,
>>>> + },
>>>> + [IORING_OP_RECVFROM] = {
>>>> + .needs_file = 1,
>>>> + .unbound_nonreg_file = 1,
>>>> + .pollin = 1,
>>>> + .buffer_select = 1,
>>>> + .audit_skip = 1,
>>>> + },
>>>> };
>>>>
>>>> /* requests with any of those set should undergo io_disarm_next() */
>>>> @@ -5216,12 +5237,24 @@ static int io_sendmsg_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
>>>> if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
>>>> return -EINVAL;
>>>>
>>>> + /*
>>>> + * For IORING_OP_SEND{,TO}, the assignment to @sr->umsg
>>>> + * is equivalent to an assignment to @sr->buf.
>>>> + */
>>>> sr->umsg = u64_to_user_ptr(READ_ONCE(sqe->addr));
>>>> +
>>>> sr->len = READ_ONCE(sqe->len);
>>>> sr->msg_flags = READ_ONCE(sqe->msg_flags) | MSG_NOSIGNAL;
>>>> if (sr->msg_flags & MSG_DONTWAIT)
>>>> req->flags |= REQ_F_NOWAIT;
>>>>
>>>> + if (req->opcode == IORING_OP_SENDTO) {
>>>> + sr->addr = u64_to_user_ptr(READ_ONCE(sqe->addr2));
>>>> + sr->sendto_addr_len = READ_ONCE(sqe->addr3);
>>>> + } else {
>>>> + sr->addr = (struct sockaddr __user *) NULL;
>>>
>>> Let's have sendto_addr_len = 0
>>
>> Will do in the RFC v5.
>>
>>>
>>>> + }
>>>> +
>>>> #ifdef CONFIG_COMPAT
>>>> if (req->ctx->compat)
>>>> sr->msg_flags |= MSG_CMSG_COMPAT;
>>>> @@ -5275,6 +5308,7 @@ static int io_sendmsg(struct io_kiocb *req, unsigned int issue_flags)
>>>>
>>>> static int io_sendto(struct io_kiocb *req, unsigned int issue_flags)
>>>> {
>>>> + struct sockaddr_storage address;
>>>> struct io_sr_msg *sr = &req->sr_msg;
>>>> struct msghdr msg;
>>>> struct iovec iov;
>>>> @@ -5291,10 +5325,20 @@ static int io_sendto(struct io_kiocb *req, unsigned int issue_flags)
>>>> if (unlikely(ret))
>>>> return ret;
>>>>
>>>> - msg.msg_name = NULL;
>>>> +
>>>> msg.msg_control = NULL;
>>>> msg.msg_controllen = 0;
>>>> - msg.msg_namelen = 0;
>>>> + if (sr->addr) {
>>>> + ret = move_addr_to_kernel(sr->addr, sr->sendto_addr_len,
>>>> + &address);
>>>> + if (unlikely(ret < 0))
>>>> + goto fail;
>>>> + msg.msg_name = (struct sockaddr *) &address;
>>>> + msg.msg_namelen = sr->sendto_addr_len;
>>>> + } else {
>>>> + msg.msg_name = NULL;
>>>> + msg.msg_namelen = 0;
>>>> + }
>>>>
>>>> flags = req->sr_msg.msg_flags;
>>>> if (issue_flags & IO_URING_F_NONBLOCK)
>>>> @@ -5309,6 +5353,7 @@ static int io_sendto(struct io_kiocb *req, unsigned int issue_flags)
>>>> return -EAGAIN;
>>>> if (ret == -ERESTARTSYS)
>>>> ret = -EINTR;
>>>> + fail:
>>>> req_set_fail(req);
>>>
>>> I think there is a problem with "fail" goto statement. Not getting
>>> full clarity on this change. With latest kernel, I see
>>> req_set_fail(req) inside if check, which I don't see here. Can you
>>> please resend the patch on latest kernel version. Thanks.
>>
>> I will send the v5 on top of "for-next" branch in Jens' tree soon.
>>
>> That is already inside an "if check" anyway. We go to that label when
>> the move_addr_to_kernel() fails (most of the time it is -EFAULT or
>> -EINVAL).
>>
>> That part looks like this (note the if check before the goto):
>> ----------------------------------------------------------------------
>> msg.msg_control = NULL;
>> msg.msg_controllen = 0;
>> if (sr->addr) {
>> ret = move_addr_to_kernel(sr->addr, sr->sendto_addr_len,
>> &address);
>> if (unlikely(ret < 0))
>> goto fail;
>> msg.msg_name = (struct sockaddr *) &address;
>> msg.msg_namelen = sr->sendto_addr_len;
>> } else {
>> msg.msg_name = NULL;
>> msg.msg_namelen = 0;
>> }
>>
>> flags = req->sr_msg.msg_flags;
>> if (issue_flags & IO_URING_F_NONBLOCK)
>> flags |= MSG_DONTWAIT;
>> if (flags & MSG_WAITALL)
>> min_ret = iov_iter_count(&msg.msg_iter);
>>
>> msg.msg_flags = flags;
>> ret = sock_sendmsg(sock, &msg);
>> if (ret < min_ret) {
>> if (ret == -EAGAIN && (issue_flags & IO_URING_F_NONBLOCK))
>> return -EAGAIN;
>> if (ret == -ERESTARTSYS)
>> ret = -EINTR;
>> fail:
>
> Thanks for sending this. IMO this goto label should be just before
> the "if (ret < min_ret)" statement.

AFAICT, error returned by move_addr_to_kernel are only -EFAULT and -EINVAL,
so the check against -EAGAIN and -ERESTARTSYS is unnecessary for that
case. We can skip that.

RFC v4 here (rebased on top of "for-next" Jens' tree):
https://lore.kernel.org/io-uring/[email protected]/T/

>
>> req_set_fail(req);
>> }
>> __io_req_complete(req, issue_flags, ret, 0);
>> return 0;
>> ----------------------------------------------------------------------
>>
>>>> }
>>>> __io_req_complete(req, issue_flags, ret, 0);
>>>> @@ -5427,13 +5472,25 @@ static int io_recvmsg_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
>>>> if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
>>>> return -EINVAL;
>>>>
>>>> + /*
>>>> + * For IORING_OP_RECV{,FROM}, the assignment to @sr->umsg
>>>> + * is equivalent to an assignment to @sr->buf.
>>>> + */
>>>> sr->umsg = u64_to_user_ptr(READ_ONCE(sqe->addr));
>>>> +
>>>> sr->len = READ_ONCE(sqe->len);
>>>> sr->bgid = READ_ONCE(sqe->buf_group);
>>>> sr->msg_flags = READ_ONCE(sqe->msg_flags) | MSG_NOSIGNAL;
>>>> if (sr->msg_flags & MSG_DONTWAIT)
>>>> req->flags |= REQ_F_NOWAIT;
>>>>
>>>> + if (req->opcode == IORING_OP_RECVFROM) {
>>>> + sr->addr = u64_to_user_ptr(READ_ONCE(sqe->addr2));
>>>> + sr->recvfrom_addr_len = u64_to_user_ptr(READ_ONCE(sqe->addr3));
>>>> + } else {
>>>> + sr->addr = (struct sockaddr __user *) NULL;
>>>
>>> I think recvfrom_addr_len should also be pointed to NULL, instead of
>>> garbage for this case.
>>
>> Will do in the RFC v5.
>>
>>>
>>>> + }
>>>> +
>>>> #ifdef CONFIG_COMPAT
>>>> if (req->ctx->compat)
>>>> sr->msg_flags |= MSG_CMSG_COMPAT;
>>>> @@ -5509,6 +5566,7 @@ static int io_recvfrom(struct io_kiocb *req, unsigned int issue_flags)
>>>> struct iovec iov;
>>>> unsigned flags;
>>>> int ret, min_ret = 0;
>>>> + struct sockaddr_storage address;
>>>> bool force_nonblock = issue_flags & IO_URING_F_NONBLOCK;
>>>>
>>>> sock = sock_from_file(req->file);
>>>> @@ -5526,7 +5584,7 @@ static int io_recvfrom(struct io_kiocb *req, unsigned int issue_flags)
>>>> if (unlikely(ret))
>>>> goto out_free;
>>>>
>>>> - msg.msg_name = NULL;
>>>> + msg.msg_name = sr->addr ? (struct sockaddr *) &address : NULL;
>>>> msg.msg_control = NULL;
>>>> msg.msg_controllen = 0;
>>>> msg.msg_namelen = 0;
>>>
>>> I think namelen should also be updated ?
>>
>> It doesn't have to be updated. From net/socket.c there is a comment
>> like this:
>>
>> /* We assume all kernel code knows the size of sockaddr_storage */
>> msg.msg_namelen = 0;
>>
>> Full __sys_recvfrom() source code, see here:
>> https://github.com/torvalds/linux/blob/v5.16-rc8/net/socket.c#L2085-L2088
>>
>> I will add the same comment in next series to clarify this one.
>>
>>>
>>>> @@ -5540,6 +5598,16 @@ static int io_recvfrom(struct io_kiocb *req, unsigned int issue_flags)
>>>> min_ret = iov_iter_count(&msg.msg_iter);
>>>>
>>>> ret = sock_recvmsg(sock, &msg, flags);
>>>> +
>>>> + if (ret >= 0 && sr->addr != NULL) {
>>>> + int tmp;
>>>> +
>>>> + tmp = move_addr_to_user(&address, msg.msg_namelen, sr->addr,
>>>> + sr->recvfrom_addr_len);
>>>> + if (unlikely(tmp < 0))
>>>> + ret = tmp;
>>>> + }
>>>> +
>>>> out_free:
>>>> if (ret < min_ret) {
>>>> if (ret == -EAGAIN && force_nonblock)
>>>> @@ -6778,9 +6846,11 @@ static int io_req_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
>>>> case IORING_OP_SYNC_FILE_RANGE:
>>>> return io_sfr_prep(req, sqe);
>>>> case IORING_OP_SENDMSG:
>>>> + case IORING_OP_SENDTO:
>>>> case IORING_OP_SEND:
>>>> return io_sendmsg_prep(req, sqe);
>>>> case IORING_OP_RECVMSG:
>>>> + case IORING_OP_RECVFROM:
>>>> case IORING_OP_RECV:
>>>> return io_recvmsg_prep(req, sqe);
>>>> case IORING_OP_CONNECT:
>>>> @@ -7060,12 +7130,14 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags)
>>>> case IORING_OP_SENDMSG:
>>>> ret = io_sendmsg(req, issue_flags);
>>>> break;
>>>> + case IORING_OP_SENDTO:
>>>> case IORING_OP_SEND:
>>>> ret = io_sendto(req, issue_flags);
>>>> break;
>>>> case IORING_OP_RECVMSG:
>>>> ret = io_recvmsg(req, issue_flags);
>>>> break;
>>>> + case IORING_OP_RECVFROM:
>>>> case IORING_OP_RECV:
>>>> ret = io_recvfrom(req, issue_flags);
>>>> break;
>>>> diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
>>>> index efc7ac9b3a6b..a360069d1e8e 100644
>>>> --- a/include/uapi/linux/io_uring.h
>>>> +++ b/include/uapi/linux/io_uring.h
>>>> @@ -150,6 +150,8 @@ enum {
>>>> IORING_OP_SETXATTR,
>>>> IORING_OP_FGETXATTR,
>>>> IORING_OP_GETXATTR,
>>>> + IORING_OP_SENDTO,
>>>> + IORING_OP_RECVFROM,
>>>>
>>>> /* this goes last, obviously */
>>>> IORING_OP_LAST,
>>>
>>>
>>> Regards,
>>>
>>> ~Praveen.
>>>
>>
>> Thanks for the review. I will send the RFC v5 soon.
>>
>

--
Ammar Faizi


Attachments:
OpenPGP_0x364FBA34FF170A4B.asc (1.72 kB)
OpenPGP public key
OpenPGP_signature (495.00 B)
OpenPGP digital signature
Download all attachments