Hello,
This RFC patchset adds sendto(2) and recvfrom(2) support for io_uring.
It also addresses an issue in the liburing GitHub repository [1].
## Motivations:
1) By using `sendto()` and `recvfrom()` we can make the submission
simpler compared to always using `sendmsg()` and `recvmsg()` from
the userspace.
2) There is a historical patch that tried to add the same
functionality, but did not end up being applied. [2]
On Tue, 7 Jul 2020 12:29:18 -0600, Jens Axboe <[email protected]> wrote:
> In a private conversation with the author, a good point was brought
> up that the sendto/recvfrom do not require an allocation of an async
> context, if we need to defer or go async with the request. I think
> that's a major win, to be honest. There are other benefits as well
> (like shorter path), but to me, the async less part is nice and will
> reduce overhead
## Changes summary
There are 3 patches in this series.
PATCH 1/3 renames io_recv to io_recvfrom and io_send to io_sendto.
Note that
send(sockfd, buf, len, flags);
is equivalent to
sendto(sockfd, buf, len, flags, NULL, 0);
and
recv(sockfd, buf, len, flags);
is equivalent to
recvfrom(sockfd, buf, len, flags, NULL, NULL);
So it is saner to have `send` and `recv` directed to `sendto` and
`recvfrom` instead of the opposite with respect to the name.
PATCH 2/3 makes `move_addr_to_user()` be a non static function. This
function lives in net/socket.c, we need to call this from io_uring
to add `recvfrom()` support for liburing. Added net files maintainers
to the CC list.
PATCH 3/3 adds `sendto(2)` and `recvfrom(2)` support for io_uring.
Added two new opcodes: IORING_OP_SENDTO and IORING_OP_RECVFROM.
## How to test
This patchset is based on "for-next" branch commit:
aafc6e6eba29c890b0031267fc37c43490447c81 ("Merge branch 'for-5.17/io_uring' into for-next")
It is also available in the Git repository at:
https://github.com/ammarfaizi2/linux-block ammarfaizi2/linux-block/io_uring-recvfrom-sendto
I also added the liburing support and test. The liburing support is
based on "xattr-getdents64" branch commit:
18d71076f6c97e1b25aa0e3b0e12a913ec4717fa ("src/include/liburing.h: style cleanups")
It is available in the Git repository at:
https://github.com/ammarfaizi2/liburing sendto-recvfrom
Link: https://github.com/axboe/liburing/issues/397 [1]
Link: https://lore.kernel.org/io-uring/[email protected]/ [2]
Cc: "David S. Miller" <[email protected]>
Cc: Jakub Kicinski <[email protected]>
Cc: [email protected]
Signed-off-by: Ammar Faizi <[email protected]>
---
Ammar Faizi (3):
io_uring: Rename `io_{send,recv}` to `io_{sendto,recvfrom}`
net: Make `move_addr_to_user()` be a non static function
io_uring: Add `sendto(2)` and `recvfrom(2)` support
fs/io_uring.c | 88 +++++++++++++++++++++++++++++++----
include/linux/socket.h | 2 +
include/uapi/linux/io_uring.h | 2 +
net/socket.c | 4 +-
4 files changed, 86 insertions(+), 10 deletions(-)
base-commit: aafc6e6eba29c890b0031267fc37c43490447c81
--
Ammar Faizi
Currently we can perform `send` and `recv` via io_uring. And now, we
are going to add `sendto` and `recvfrom` support for io_uring.
Note that:
Calling `send(fd, buf, len, flags)` is equivalent to calling
`sendto(fd, buf, len, flags, NULL, 0)`. Therefore, `sendto`
is a superset of `send`.
Calling `recv(fd, buf, len, flags)` is equivalent to calling
`recvfrom(fd, buf, len, flags, NULL, NULL)`. Therefore, `recvfrom`
is a superset of `recv`.
As such, let's direct the current supported `IORING_OP_{SEND,RECV}` to
`io_{sendto,recvfrom}`. These functions will also be used for
`IORING_OP_{SENDTO,RECVFROM}` operation in the next patches.
Signed-off-by: Ammar Faizi <[email protected]>
---
fs/io_uring.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 90002bb3fdf4..d564f98d5d3b 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -5273,7 +5273,7 @@ static int io_sendmsg(struct io_kiocb *req, unsigned int issue_flags)
return 0;
}
-static int io_send(struct io_kiocb *req, unsigned int issue_flags)
+static int io_sendto(struct io_kiocb *req, unsigned int issue_flags)
{
struct io_sr_msg *sr = &req->sr_msg;
struct msghdr msg;
@@ -5499,7 +5499,7 @@ static int io_recvmsg(struct io_kiocb *req, unsigned int issue_flags)
return 0;
}
-static int io_recv(struct io_kiocb *req, unsigned int issue_flags)
+static int io_recvfrom(struct io_kiocb *req, unsigned int issue_flags)
{
struct io_buffer *kbuf;
struct io_sr_msg *sr = &req->sr_msg;
@@ -7061,13 +7061,13 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags)
ret = io_sendmsg(req, issue_flags);
break;
case IORING_OP_SEND:
- ret = io_send(req, issue_flags);
+ ret = io_sendto(req, issue_flags);
break;
case IORING_OP_RECVMSG:
ret = io_recvmsg(req, issue_flags);
break;
case IORING_OP_RECV:
- ret = io_recv(req, issue_flags);
+ ret = io_recvfrom(req, issue_flags);
break;
case IORING_OP_TIMEOUT:
ret = io_timeout(req, issue_flags);
--
2.32.0
In order to add recvfrom support for io_uring, we need to call
`move_addr_to_user()` in fs/io_uring.c.
This makes `move_addr_to_user()` be a non static function so we can
call it from io_uring.
Cc: "David S. Miller" <[email protected]>
Cc: Jakub Kicinski <[email protected]>
Cc: [email protected]
Signed-off-by: Ammar Faizi <[email protected]>
---
include/linux/socket.h | 2 ++
net/socket.c | 4 ++--
2 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/include/linux/socket.h b/include/linux/socket.h
index 8ef26d89ef49..0d0bc1ace50c 100644
--- a/include/linux/socket.h
+++ b/include/linux/socket.h
@@ -371,6 +371,8 @@ struct ucred {
#define IPX_TYPE 1
extern int move_addr_to_kernel(void __user *uaddr, int ulen, struct sockaddr_storage *kaddr);
+extern int move_addr_to_user(struct sockaddr_storage *kaddr, int klen,
+ void __user *uaddr, int __user *ulen);
extern int put_cmsg(struct msghdr*, int level, int type, int len, void *data);
struct timespec64;
diff --git a/net/socket.c b/net/socket.c
index 7f64a6eccf63..af521d351c8a 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -267,8 +267,8 @@ int move_addr_to_kernel(void __user *uaddr, int ulen, struct sockaddr_storage *k
* specified. Zero is returned for a success.
*/
-static int move_addr_to_user(struct sockaddr_storage *kaddr, int klen,
- void __user *uaddr, int __user *ulen)
+int move_addr_to_user(struct sockaddr_storage *kaddr, int klen,
+ void __user *uaddr, int __user *ulen)
{
int err;
int len;
--
2.32.0
This adds sendto(2) and recvfrom(2) support for io_uring.
New opcodes:
IORING_OP_SENDTO
IORING_OP_RECVFROM
Signed-off-by: Ammar Faizi <[email protected]>
---
fs/io_uring.c | 80 +++++++++++++++++++++++++++++++++--
include/uapi/linux/io_uring.h | 2 +
2 files changed, 78 insertions(+), 4 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index d564f98d5d3b..4406c044798f 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -575,7 +575,15 @@ struct io_sr_msg {
union {
struct compat_msghdr __user *umsg_compat;
struct user_msghdr __user *umsg;
- void __user *buf;
+
+ struct {
+ void __user *buf;
+ struct sockaddr __user *addr;
+ union {
+ int sendto_addr_len;
+ int __user *recvfrom_addr_len;
+ };
+ };
};
int msg_flags;
int bgid;
@@ -1133,6 +1141,19 @@ static const struct io_op_def io_op_defs[] = {
.needs_file = 1
},
[IORING_OP_GETXATTR] = {},
+ [IORING_OP_SENDTO] = {
+ .needs_file = 1,
+ .unbound_nonreg_file = 1,
+ .pollout = 1,
+ .audit_skip = 1,
+ },
+ [IORING_OP_RECVFROM] = {
+ .needs_file = 1,
+ .unbound_nonreg_file = 1,
+ .pollin = 1,
+ .buffer_select = 1,
+ .audit_skip = 1,
+ },
};
/* requests with any of those set should undergo io_disarm_next() */
@@ -5216,12 +5237,24 @@ static int io_sendmsg_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
return -EINVAL;
+ /*
+ * For IORING_OP_SEND{,TO}, the assignment to @sr->umsg
+ * is equivalent to an assignment to @sr->buf.
+ */
sr->umsg = u64_to_user_ptr(READ_ONCE(sqe->addr));
+
sr->len = READ_ONCE(sqe->len);
sr->msg_flags = READ_ONCE(sqe->msg_flags) | MSG_NOSIGNAL;
if (sr->msg_flags & MSG_DONTWAIT)
req->flags |= REQ_F_NOWAIT;
+ if (req->opcode == IORING_OP_SENDTO) {
+ sr->addr = u64_to_user_ptr(READ_ONCE(sqe->addr2));
+ sr->sendto_addr_len = READ_ONCE(sqe->addr3);
+ } else {
+ sr->addr = (struct sockaddr __user *) NULL;
+ }
+
#ifdef CONFIG_COMPAT
if (req->ctx->compat)
sr->msg_flags |= MSG_CMSG_COMPAT;
@@ -5275,6 +5308,7 @@ static int io_sendmsg(struct io_kiocb *req, unsigned int issue_flags)
static int io_sendto(struct io_kiocb *req, unsigned int issue_flags)
{
+ struct sockaddr_storage address;
struct io_sr_msg *sr = &req->sr_msg;
struct msghdr msg;
struct iovec iov;
@@ -5291,10 +5325,20 @@ static int io_sendto(struct io_kiocb *req, unsigned int issue_flags)
if (unlikely(ret))
return ret;
- msg.msg_name = NULL;
+
msg.msg_control = NULL;
msg.msg_controllen = 0;
- msg.msg_namelen = 0;
+ if (sr->addr) {
+ ret = move_addr_to_kernel(sr->addr, sr->sendto_addr_len,
+ &address);
+ if (unlikely(ret < 0))
+ goto fail;
+ msg.msg_name = (struct sockaddr *) &address;
+ msg.msg_namelen = sr->sendto_addr_len;
+ } else {
+ msg.msg_name = NULL;
+ msg.msg_namelen = 0;
+ }
flags = req->sr_msg.msg_flags;
if (issue_flags & IO_URING_F_NONBLOCK)
@@ -5309,6 +5353,7 @@ static int io_sendto(struct io_kiocb *req, unsigned int issue_flags)
return -EAGAIN;
if (ret == -ERESTARTSYS)
ret = -EINTR;
+ fail:
req_set_fail(req);
}
__io_req_complete(req, issue_flags, ret, 0);
@@ -5427,13 +5472,25 @@ static int io_recvmsg_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
return -EINVAL;
+ /*
+ * For IORING_OP_RECV{,FROM}, the assignment to @sr->umsg
+ * is equivalent to an assignment to @sr->buf.
+ */
sr->umsg = u64_to_user_ptr(READ_ONCE(sqe->addr));
+
sr->len = READ_ONCE(sqe->len);
sr->bgid = READ_ONCE(sqe->buf_group);
sr->msg_flags = READ_ONCE(sqe->msg_flags) | MSG_NOSIGNAL;
if (sr->msg_flags & MSG_DONTWAIT)
req->flags |= REQ_F_NOWAIT;
+ if (req->opcode == IORING_OP_RECVFROM) {
+ sr->addr = u64_to_user_ptr(READ_ONCE(sqe->addr2));
+ sr->recvfrom_addr_len = u64_to_user_ptr(READ_ONCE(sqe->addr3));
+ } else {
+ sr->addr = (struct sockaddr __user *) NULL;
+ }
+
#ifdef CONFIG_COMPAT
if (req->ctx->compat)
sr->msg_flags |= MSG_CMSG_COMPAT;
@@ -5509,6 +5566,7 @@ static int io_recvfrom(struct io_kiocb *req, unsigned int issue_flags)
struct iovec iov;
unsigned flags;
int ret, min_ret = 0;
+ struct sockaddr_storage address;
bool force_nonblock = issue_flags & IO_URING_F_NONBLOCK;
sock = sock_from_file(req->file);
@@ -5526,7 +5584,7 @@ static int io_recvfrom(struct io_kiocb *req, unsigned int issue_flags)
if (unlikely(ret))
goto out_free;
- msg.msg_name = NULL;
+ msg.msg_name = sr->addr ? (struct sockaddr *) &address : NULL;
msg.msg_control = NULL;
msg.msg_controllen = 0;
msg.msg_namelen = 0;
@@ -5540,6 +5598,16 @@ static int io_recvfrom(struct io_kiocb *req, unsigned int issue_flags)
min_ret = iov_iter_count(&msg.msg_iter);
ret = sock_recvmsg(sock, &msg, flags);
+
+ if (ret >= 0 && sr->addr != NULL) {
+ int tmp;
+
+ tmp = move_addr_to_user(&address, msg.msg_namelen, sr->addr,
+ sr->recvfrom_addr_len);
+ if (tmp < 0)
+ ret = tmp;
+ }
+
out_free:
if (ret < min_ret) {
if (ret == -EAGAIN && force_nonblock)
@@ -6778,9 +6846,11 @@ static int io_req_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
case IORING_OP_SYNC_FILE_RANGE:
return io_sfr_prep(req, sqe);
case IORING_OP_SENDMSG:
+ case IORING_OP_SENDTO:
case IORING_OP_SEND:
return io_sendmsg_prep(req, sqe);
case IORING_OP_RECVMSG:
+ case IORING_OP_RECVFROM:
case IORING_OP_RECV:
return io_recvmsg_prep(req, sqe);
case IORING_OP_CONNECT:
@@ -7060,12 +7130,14 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags)
case IORING_OP_SENDMSG:
ret = io_sendmsg(req, issue_flags);
break;
+ case IORING_OP_SENDTO:
case IORING_OP_SEND:
ret = io_sendto(req, issue_flags);
break;
case IORING_OP_RECVMSG:
ret = io_recvmsg(req, issue_flags);
break;
+ case IORING_OP_RECVFROM:
case IORING_OP_RECV:
ret = io_recvfrom(req, issue_flags);
break;
diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index efc7ac9b3a6b..a360069d1e8e 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -150,6 +150,8 @@ enum {
IORING_OP_SETXATTR,
IORING_OP_FGETXATTR,
IORING_OP_GETXATTR,
+ IORING_OP_SENDTO,
+ IORING_OP_RECVFROM,
/* this goes last, obviously */
IORING_OP_LAST,
--
2.32.0
Hello,
This RFC patchset adds sendto(2) and recvfrom(2) support for io_uring.
It also addresses an issue in the liburing GitHub repository [1].
## Motivations:
1) By using `sendto()` and `recvfrom()` we can make the submission
simpler compared to always using `sendmsg()` and `recvmsg()` from
the userspace.
2) There is a historical patch that tried to add the same
functionality, but did not end up being applied. [2]
On Tue, 7 Jul 2020 12:29:18 -0600, Jens Axboe <[email protected]> wrote:
> In a private conversation with the author, a good point was brought
> up that the sendto/recvfrom do not require an allocation of an async
> context, if we need to defer or go async with the request. I think
> that's a major win, to be honest. There are other benefits as well
> (like shorter path), but to me, the async less part is nice and will
> reduce overhead
## Changes summary
There are 3 patches in this series.
PATCH 1/3 renames io_recv to io_recvfrom and io_send to io_sendto.
Note that
send(sockfd, buf, len, flags);
is equivalent to
sendto(sockfd, buf, len, flags, NULL, 0);
and
recv(sockfd, buf, len, flags);
is equivalent to
recvfrom(sockfd, buf, len, flags, NULL, NULL);
So it is saner to have `send` and `recv` directed to `sendto` and
`recvfrom` instead of the opposite with respect to the name.
PATCH 2/3 makes `move_addr_to_user()` be a non static function. This
function lives in net/socket.c, we need to call this from io_uring
to add `recvfrom()` support for liburing. Added net files maintainers
to the CC list.
PATCH 3/3 adds `sendto(2)` and `recvfrom(2)` support for io_uring.
Added two new opcodes: IORING_OP_SENDTO and IORING_OP_RECVFROM.
## How to test
This patchset is based on "for-next" branch commit:
bb3294e22482db4b7ec7cfbb2d0f5b53c1adcf86 ("Merge branch 'for-5.17/drivers' into for-next")
It is also available in the Git repository at:
https://github.com/ammarfaizi2/linux-block ammarfaizi2/linux-block/io_uring-recvfrom-sendto
I also added the liburing support and test. The liburing support is
based on "xattr-getdents64" branch commit:
55a9bf979f27f3a5c9f456f26dcfe16c4791667b ("src/include/liburing.h: style cleanups")
It is available in the Git repository at:
https://github.com/ammarfaizi2/liburing sendto-recvfrom
---
v2:
- Rebased the work, now this patchset is based on commit
bb3294e22482db4b7ec ("Merge branch 'for-5.17/drivers' into
for-next").
- In `io_recvfrom()`, mark the error check of `move_addr_to_user()`
call as unlikely.
- Fix build error when CONFIG_NET is undefined.
- Update liburing test (the branch is still the same, just force
pushed).
- Added Nugra to CC list (tester).
---
RFC v1: https://lore.kernel.org/io-uring/[email protected]/
Link: https://github.com/axboe/liburing/issues/397 [1]
Link: https://lore.kernel.org/io-uring/[email protected]/ [2]
Cc: "David S. Miller" <[email protected]>
Cc: Jakub Kicinski <[email protected]>
Cc: [email protected]
Cc: Nugra <[email protected]>
Signed-off-by: Ammar Faizi <[email protected]>
---
Ammar Faizi (3):
io_uring: Rename `io_{send,recv}` to `io_{sendto,recvfrom}`
net: Make `move_addr_to_user()` be a non static function
io_uring: Add `sendto(2)` and `recvfrom(2)` support
fs/io_uring.c | 92 +++++++++++++++++++++++++++++++----
include/linux/socket.h | 2 +
include/uapi/linux/io_uring.h | 2 +
net/socket.c | 4 +-
4 files changed, 88 insertions(+), 12 deletions(-)
base-commit: bb3294e22482db4b7ec7cfbb2d0f5b53c1adcf86
--
2.32.0
Currently we can perform `send` and `recv` via io_uring. And now, we
are going to add `sendto` and `recvfrom` support for io_uring.
Note that:
Calling `send(fd, buf, len, flags)` is equivalent to calling
`sendto(fd, buf, len, flags, NULL, 0)`. Therefore, `sendto`
is a superset of `send`.
Calling `recv(fd, buf, len, flags)` is equivalent to calling
`recvfrom(fd, buf, len, flags, NULL, NULL)`. Therefore, `recvfrom`
is a superset of `recv`.
As such, let's direct the current supported `IORING_OP_{SEND,RECV}` to
`io_{sendto,recvfrom}`. These functions will also be used for
`IORING_OP_{SENDTO,RECVFROM}` operation in the next patches.
Cc: Nugra <[email protected]>
Signed-off-by: Ammar Faizi <[email protected]>
---
fs/io_uring.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 90002bb3fdf4..d564f98d5d3b 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -5273,7 +5273,7 @@ static int io_sendmsg(struct io_kiocb *req, unsigned int issue_flags)
return 0;
}
-static int io_send(struct io_kiocb *req, unsigned int issue_flags)
+static int io_sendto(struct io_kiocb *req, unsigned int issue_flags)
{
struct io_sr_msg *sr = &req->sr_msg;
struct msghdr msg;
@@ -5499,7 +5499,7 @@ static int io_recvmsg(struct io_kiocb *req, unsigned int issue_flags)
return 0;
}
-static int io_recv(struct io_kiocb *req, unsigned int issue_flags)
+static int io_recvfrom(struct io_kiocb *req, unsigned int issue_flags)
{
struct io_buffer *kbuf;
struct io_sr_msg *sr = &req->sr_msg;
@@ -7061,13 +7061,13 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags)
ret = io_sendmsg(req, issue_flags);
break;
case IORING_OP_SEND:
- ret = io_send(req, issue_flags);
+ ret = io_sendto(req, issue_flags);
break;
case IORING_OP_RECVMSG:
ret = io_recvmsg(req, issue_flags);
break;
case IORING_OP_RECV:
- ret = io_recv(req, issue_flags);
+ ret = io_recvfrom(req, issue_flags);
break;
case IORING_OP_TIMEOUT:
ret = io_timeout(req, issue_flags);
--
2.32.0
In order to add recvfrom support for io_uring, we need to call
`move_addr_to_user()` in fs/io_uring.c.
This makes `move_addr_to_user()` be a non static function so we can
call it from io_uring.
Cc: "David S. Miller" <[email protected]>
Cc: Jakub Kicinski <[email protected]>
Cc: [email protected]
Cc: Nugra <[email protected]>
Signed-off-by: Ammar Faizi <[email protected]>
---
include/linux/socket.h | 2 ++
net/socket.c | 4 ++--
2 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/include/linux/socket.h b/include/linux/socket.h
index 8ef26d89ef49..0d0bc1ace50c 100644
--- a/include/linux/socket.h
+++ b/include/linux/socket.h
@@ -371,6 +371,8 @@ struct ucred {
#define IPX_TYPE 1
extern int move_addr_to_kernel(void __user *uaddr, int ulen, struct sockaddr_storage *kaddr);
+extern int move_addr_to_user(struct sockaddr_storage *kaddr, int klen,
+ void __user *uaddr, int __user *ulen);
extern int put_cmsg(struct msghdr*, int level, int type, int len, void *data);
struct timespec64;
diff --git a/net/socket.c b/net/socket.c
index 7f64a6eccf63..af521d351c8a 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -267,8 +267,8 @@ int move_addr_to_kernel(void __user *uaddr, int ulen, struct sockaddr_storage *k
* specified. Zero is returned for a success.
*/
-static int move_addr_to_user(struct sockaddr_storage *kaddr, int klen,
- void __user *uaddr, int __user *ulen)
+int move_addr_to_user(struct sockaddr_storage *kaddr, int klen,
+ void __user *uaddr, int __user *ulen)
{
int err;
int len;
--
2.32.0
This adds sendto(2) and recvfrom(2) support for io_uring.
New opcodes:
IORING_OP_SENDTO
IORING_OP_RECVFROM
Cc: Nugra <[email protected]>
Link: https://github.com/axboe/liburing/issues/397
Signed-off-by: Ammar Faizi <[email protected]>
---
v2:
- In `io_recvfrom()`, mark the error check of `move_addr_to_user()`
call as unlikely.
- Fix build error when CONFIG_NET is undefined.
fs/io_uring.c | 84 ++++++++++++++++++++++++++++++++---
include/uapi/linux/io_uring.h | 2 +
2 files changed, 80 insertions(+), 6 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index d564f98d5d3b..3726958f8f58 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -575,7 +575,15 @@ struct io_sr_msg {
union {
struct compat_msghdr __user *umsg_compat;
struct user_msghdr __user *umsg;
- void __user *buf;
+
+ struct {
+ void __user *buf;
+ struct sockaddr __user *addr;
+ union {
+ int sendto_addr_len;
+ int __user *recvfrom_addr_len;
+ };
+ };
};
int msg_flags;
int bgid;
@@ -1133,6 +1141,19 @@ static const struct io_op_def io_op_defs[] = {
.needs_file = 1
},
[IORING_OP_GETXATTR] = {},
+ [IORING_OP_SENDTO] = {
+ .needs_file = 1,
+ .unbound_nonreg_file = 1,
+ .pollout = 1,
+ .audit_skip = 1,
+ },
+ [IORING_OP_RECVFROM] = {
+ .needs_file = 1,
+ .unbound_nonreg_file = 1,
+ .pollin = 1,
+ .buffer_select = 1,
+ .audit_skip = 1,
+ },
};
/* requests with any of those set should undergo io_disarm_next() */
@@ -5216,12 +5237,24 @@ static int io_sendmsg_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
return -EINVAL;
+ /*
+ * For IORING_OP_SEND{,TO}, the assignment to @sr->umsg
+ * is equivalent to an assignment to @sr->buf.
+ */
sr->umsg = u64_to_user_ptr(READ_ONCE(sqe->addr));
+
sr->len = READ_ONCE(sqe->len);
sr->msg_flags = READ_ONCE(sqe->msg_flags) | MSG_NOSIGNAL;
if (sr->msg_flags & MSG_DONTWAIT)
req->flags |= REQ_F_NOWAIT;
+ if (req->opcode == IORING_OP_SENDTO) {
+ sr->addr = u64_to_user_ptr(READ_ONCE(sqe->addr2));
+ sr->sendto_addr_len = READ_ONCE(sqe->addr3);
+ } else {
+ sr->addr = (struct sockaddr __user *) NULL;
+ }
+
#ifdef CONFIG_COMPAT
if (req->ctx->compat)
sr->msg_flags |= MSG_CMSG_COMPAT;
@@ -5275,6 +5308,7 @@ static int io_sendmsg(struct io_kiocb *req, unsigned int issue_flags)
static int io_sendto(struct io_kiocb *req, unsigned int issue_flags)
{
+ struct sockaddr_storage address;
struct io_sr_msg *sr = &req->sr_msg;
struct msghdr msg;
struct iovec iov;
@@ -5291,10 +5325,20 @@ static int io_sendto(struct io_kiocb *req, unsigned int issue_flags)
if (unlikely(ret))
return ret;
- msg.msg_name = NULL;
+
msg.msg_control = NULL;
msg.msg_controllen = 0;
- msg.msg_namelen = 0;
+ if (sr->addr) {
+ ret = move_addr_to_kernel(sr->addr, sr->sendto_addr_len,
+ &address);
+ if (unlikely(ret < 0))
+ goto fail;
+ msg.msg_name = (struct sockaddr *) &address;
+ msg.msg_namelen = sr->sendto_addr_len;
+ } else {
+ msg.msg_name = NULL;
+ msg.msg_namelen = 0;
+ }
flags = req->sr_msg.msg_flags;
if (issue_flags & IO_URING_F_NONBLOCK)
@@ -5309,6 +5353,7 @@ static int io_sendto(struct io_kiocb *req, unsigned int issue_flags)
return -EAGAIN;
if (ret == -ERESTARTSYS)
ret = -EINTR;
+ fail:
req_set_fail(req);
}
__io_req_complete(req, issue_flags, ret, 0);
@@ -5427,13 +5472,25 @@ static int io_recvmsg_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
return -EINVAL;
+ /*
+ * For IORING_OP_RECV{,FROM}, the assignment to @sr->umsg
+ * is equivalent to an assignment to @sr->buf.
+ */
sr->umsg = u64_to_user_ptr(READ_ONCE(sqe->addr));
+
sr->len = READ_ONCE(sqe->len);
sr->bgid = READ_ONCE(sqe->buf_group);
sr->msg_flags = READ_ONCE(sqe->msg_flags) | MSG_NOSIGNAL;
if (sr->msg_flags & MSG_DONTWAIT)
req->flags |= REQ_F_NOWAIT;
+ if (req->opcode == IORING_OP_RECVFROM) {
+ sr->addr = u64_to_user_ptr(READ_ONCE(sqe->addr2));
+ sr->recvfrom_addr_len = u64_to_user_ptr(READ_ONCE(sqe->addr3));
+ } else {
+ sr->addr = (struct sockaddr __user *) NULL;
+ }
+
#ifdef CONFIG_COMPAT
if (req->ctx->compat)
sr->msg_flags |= MSG_CMSG_COMPAT;
@@ -5509,6 +5566,7 @@ static int io_recvfrom(struct io_kiocb *req, unsigned int issue_flags)
struct iovec iov;
unsigned flags;
int ret, min_ret = 0;
+ struct sockaddr_storage address;
bool force_nonblock = issue_flags & IO_URING_F_NONBLOCK;
sock = sock_from_file(req->file);
@@ -5526,7 +5584,7 @@ static int io_recvfrom(struct io_kiocb *req, unsigned int issue_flags)
if (unlikely(ret))
goto out_free;
- msg.msg_name = NULL;
+ msg.msg_name = sr->addr ? (struct sockaddr *) &address : NULL;
msg.msg_control = NULL;
msg.msg_controllen = 0;
msg.msg_namelen = 0;
@@ -5540,6 +5598,16 @@ static int io_recvfrom(struct io_kiocb *req, unsigned int issue_flags)
min_ret = iov_iter_count(&msg.msg_iter);
ret = sock_recvmsg(sock, &msg, flags);
+
+ if (ret >= 0 && sr->addr != NULL) {
+ int tmp;
+
+ tmp = move_addr_to_user(&address, msg.msg_namelen, sr->addr,
+ sr->recvfrom_addr_len);
+ if (unlikely(tmp < 0))
+ ret = tmp;
+ }
+
out_free:
if (ret < min_ret) {
if (ret == -EAGAIN && force_nonblock)
@@ -5707,8 +5775,8 @@ IO_NETOP_PREP_ASYNC(sendmsg);
IO_NETOP_PREP_ASYNC(recvmsg);
IO_NETOP_PREP_ASYNC(connect);
IO_NETOP_PREP(accept);
-IO_NETOP_FN(send);
-IO_NETOP_FN(recv);
+IO_NETOP_FN(sendto);
+IO_NETOP_FN(recvfrom);
#endif /* CONFIG_NET */
struct io_poll_table {
@@ -6778,9 +6846,11 @@ static int io_req_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
case IORING_OP_SYNC_FILE_RANGE:
return io_sfr_prep(req, sqe);
case IORING_OP_SENDMSG:
+ case IORING_OP_SENDTO:
case IORING_OP_SEND:
return io_sendmsg_prep(req, sqe);
case IORING_OP_RECVMSG:
+ case IORING_OP_RECVFROM:
case IORING_OP_RECV:
return io_recvmsg_prep(req, sqe);
case IORING_OP_CONNECT:
@@ -7060,12 +7130,14 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags)
case IORING_OP_SENDMSG:
ret = io_sendmsg(req, issue_flags);
break;
+ case IORING_OP_SENDTO:
case IORING_OP_SEND:
ret = io_sendto(req, issue_flags);
break;
case IORING_OP_RECVMSG:
ret = io_recvmsg(req, issue_flags);
break;
+ case IORING_OP_RECVFROM:
case IORING_OP_RECV:
ret = io_recvfrom(req, issue_flags);
break;
diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index efc7ac9b3a6b..a360069d1e8e 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -150,6 +150,8 @@ enum {
IORING_OP_SETXATTR,
IORING_OP_FGETXATTR,
IORING_OP_GETXATTR,
+ IORING_OP_SENDTO,
+ IORING_OP_RECVFROM,
/* this goes last, obviously */
IORING_OP_LAST,
--
2.32.0
Hello,
This RFC patchset adds sendto(2) and recvfrom(2) support for io_uring.
It also addresses an issue in the liburing GitHub repository [1].
## Motivations:
1) By using `sendto()` and `recvfrom()` we can make the submission
simpler compared to always using `sendmsg()` and `recvmsg()` from
the userspace.
2) There is a historical patch that tried to add the same
functionality, but did not end up being applied. [2]
On Tue, 7 Jul 2020 12:29:18 -0600, Jens Axboe <[email protected]> wrote:
> In a private conversation with the author, a good point was brought
> up that the sendto/recvfrom do not require an allocation of an async
> context, if we need to defer or go async with the request. I think
> that's a major win, to be honest. There are other benefits as well
> (like shorter path), but to me, the async less part is nice and will
> reduce overhead
## Changes summary
There are 3 patches in this series.
PATCH 1/3 renames io_recv to io_recvfrom and io_send to io_sendto.
Note that
send(sockfd, buf, len, flags);
is equivalent to
sendto(sockfd, buf, len, flags, NULL, 0);
and
recv(sockfd, buf, len, flags);
is equivalent to
recvfrom(sockfd, buf, len, flags, NULL, NULL);
So it is saner to have `send` and `recv` directed to `sendto` and
`recvfrom` instead of the opposite with respect to the name.
PATCH 2/3 makes `move_addr_to_user()` be a non static function. This
function lives in net/socket.c, we need to call this from io_uring
to add `recvfrom()` support for liburing. Added net files maintainers
to the CC list.
PATCH 3/3 adds `sendto(2)` and `recvfrom(2)` support for io_uring.
Added two new opcodes: IORING_OP_SENDTO and IORING_OP_RECVFROM.
## How to test
This patchset is based on "for-next" branch commit:
bb3294e22482db4b7ec7cfbb2d0f5b53c1adcf86 ("Merge branch 'for-5.17/drivers' into for-next")
It is also available in the Git repository at:
https://github.com/ammarfaizi2/linux-block ammarfaizi2/linux-block/io_uring-recvfrom-sendto
I also added the liburing support and test. The liburing support is
based on "xattr-getdents64" branch commit:
55a9bf979f27f3a5c9f456f26dcfe16c4791667b ("src/include/liburing.h: style cleanups")
It is available in the Git repository at:
https://github.com/ammarfaizi2/liburing sendto-recvfrom
---
v3:
- Fix build error when CONFIG_NET is undefined for PATCH 1/3. I
tried to fix it in PATCH 3/3, but it should be fixed in PATCH 1/3,
otherwise it breaks the build in PATCH 1/3.
- Added `io_uring_prep_{sendto,recvfrom}` docs to the liburing.
v2:
- Rebased the work, now this patchset is based on commit
bb3294e22482db4b7ec ("Merge branch 'for-5.17/drivers' into
for-next").
- In `io_recvfrom()`, mark the error check of `move_addr_to_user()`
call as unlikely.
- Fix build error when CONFIG_NET is undefined.
- Update liburing test (the branch is still the same, just force
pushed).
- Added Nugra to CC list (tester).
---
RFC v2: https://lore.kernel.org/io-uring/[email protected]/
RFC v1: https://lore.kernel.org/io-uring/[email protected]/
Link: https://github.com/axboe/liburing/issues/397 [1]
Link: https://lore.kernel.org/io-uring/[email protected]/ [2]
Cc: "David S. Miller" <[email protected]>
Cc: Jakub Kicinski <[email protected]>
Cc: [email protected]
Cc: Nugra <[email protected]>
Signed-off-by: Ammar Faizi <[email protected]>
---
Ammar Faizi (3):
io_uring: Rename `io_{send,recv}` to `io_{sendto,recvfrom}`
net: Make `move_addr_to_user()` be a non static function
io_uring: Add `sendto(2)` and `recvfrom(2)` support
fs/io_uring.c | 92 +++++++++++++++++++++++++++++++----
include/linux/socket.h | 2 +
include/uapi/linux/io_uring.h | 2 +
net/socket.c | 4 +-
4 files changed, 88 insertions(+), 12 deletions(-)
base-commit: bb3294e22482db4b7ec7cfbb2d0f5b53c1adcf86
--
Ammar Faizi
Currently we can perform `send` and `recv` via io_uring. And now, we
are going to add `sendto` and `recvfrom` support for io_uring.
Note that:
Calling `send(fd, buf, len, flags)` is equivalent to calling
`sendto(fd, buf, len, flags, NULL, 0)`. Therefore, `sendto`
is a superset of `send`.
Calling `recv(fd, buf, len, flags)` is equivalent to calling
`recvfrom(fd, buf, len, flags, NULL, NULL)`. Therefore, `recvfrom`
is a superset of `recv`.
As such, let's direct the current supported `IORING_OP_{SEND,RECV}` to
`io_{sendto,recvfrom}`. These functions will also be used for
`IORING_OP_{SENDTO,RECVFROM}` operation in the next patches.
Cc: Nugra <[email protected]>
Signed-off-by: Ammar Faizi <[email protected]>
---
v3:
- Fix build error when CONFIG_NET is undefined for PATCH 1/3. I
tried to fix it in PATCH 3/3, but it should be fixed in PATCH 1/3,
otherwise it breaks the build in PATCH 1/3.
v2:
- Added Nugra to CC list (tester).
---
fs/io_uring.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 90002bb3fdf4..7adcb591398f 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -5273,7 +5273,7 @@ static int io_sendmsg(struct io_kiocb *req, unsigned int issue_flags)
return 0;
}
-static int io_send(struct io_kiocb *req, unsigned int issue_flags)
+static int io_sendto(struct io_kiocb *req, unsigned int issue_flags)
{
struct io_sr_msg *sr = &req->sr_msg;
struct msghdr msg;
@@ -5499,7 +5499,7 @@ static int io_recvmsg(struct io_kiocb *req, unsigned int issue_flags)
return 0;
}
-static int io_recv(struct io_kiocb *req, unsigned int issue_flags)
+static int io_recvfrom(struct io_kiocb *req, unsigned int issue_flags)
{
struct io_buffer *kbuf;
struct io_sr_msg *sr = &req->sr_msg;
@@ -5707,8 +5707,8 @@ IO_NETOP_PREP_ASYNC(sendmsg);
IO_NETOP_PREP_ASYNC(recvmsg);
IO_NETOP_PREP_ASYNC(connect);
IO_NETOP_PREP(accept);
-IO_NETOP_FN(send);
-IO_NETOP_FN(recv);
+IO_NETOP_FN(sendto);
+IO_NETOP_FN(recvfrom);
#endif /* CONFIG_NET */
struct io_poll_table {
@@ -7061,13 +7061,13 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags)
ret = io_sendmsg(req, issue_flags);
break;
case IORING_OP_SEND:
- ret = io_send(req, issue_flags);
+ ret = io_sendto(req, issue_flags);
break;
case IORING_OP_RECVMSG:
ret = io_recvmsg(req, issue_flags);
break;
case IORING_OP_RECV:
- ret = io_recv(req, issue_flags);
+ ret = io_recvfrom(req, issue_flags);
break;
case IORING_OP_TIMEOUT:
ret = io_timeout(req, issue_flags);
--
2.32.0
In order to add recvfrom support for io_uring, we need to call
`move_addr_to_user()` in fs/io_uring.c.
This makes `move_addr_to_user()` be a non static function so we can
call it from io_uring.
Cc: "David S. Miller" <[email protected]>
Cc: Jakub Kicinski <[email protected]>
Cc: [email protected]
Cc: Nugra <[email protected]>
Signed-off-by: Ammar Faizi <[email protected]>
---
v3:
* No changes *
v2:
- Added Nugra to CC list (tester).
---
include/linux/socket.h | 2 ++
net/socket.c | 4 ++--
2 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/include/linux/socket.h b/include/linux/socket.h
index 8ef26d89ef49..0d0bc1ace50c 100644
--- a/include/linux/socket.h
+++ b/include/linux/socket.h
@@ -371,6 +371,8 @@ struct ucred {
#define IPX_TYPE 1
extern int move_addr_to_kernel(void __user *uaddr, int ulen, struct sockaddr_storage *kaddr);
+extern int move_addr_to_user(struct sockaddr_storage *kaddr, int klen,
+ void __user *uaddr, int __user *ulen);
extern int put_cmsg(struct msghdr*, int level, int type, int len, void *data);
struct timespec64;
diff --git a/net/socket.c b/net/socket.c
index 7f64a6eccf63..af521d351c8a 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -267,8 +267,8 @@ int move_addr_to_kernel(void __user *uaddr, int ulen, struct sockaddr_storage *k
* specified. Zero is returned for a success.
*/
-static int move_addr_to_user(struct sockaddr_storage *kaddr, int klen,
- void __user *uaddr, int __user *ulen)
+int move_addr_to_user(struct sockaddr_storage *kaddr, int klen,
+ void __user *uaddr, int __user *ulen)
{
int err;
int len;
--
2.32.0
This adds sendto(2) and recvfrom(2) support for io_uring.
New opcodes:
IORING_OP_SENDTO
IORING_OP_RECVFROM
Cc: Nugra <[email protected]>
Tested-by: Nugra <[email protected]>
Link: https://github.com/axboe/liburing/issues/397
Signed-off-by: Ammar Faizi <[email protected]>
---
v3:
- Fix build error when CONFIG_NET is undefined should be done in
the first patch, not this patch.
- Add Tested-by tag from Nugra.
v2:
- In `io_recvfrom()`, mark the error check of `move_addr_to_user()`
call as unlikely.
- Fix build error when CONFIG_NET is undefined.
- Added Nugra to CC list (tester).
---
fs/io_uring.c | 80 +++++++++++++++++++++++++++++++++--
include/uapi/linux/io_uring.h | 2 +
2 files changed, 78 insertions(+), 4 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 7adcb591398f..3726958f8f58 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -575,7 +575,15 @@ struct io_sr_msg {
union {
struct compat_msghdr __user *umsg_compat;
struct user_msghdr __user *umsg;
- void __user *buf;
+
+ struct {
+ void __user *buf;
+ struct sockaddr __user *addr;
+ union {
+ int sendto_addr_len;
+ int __user *recvfrom_addr_len;
+ };
+ };
};
int msg_flags;
int bgid;
@@ -1133,6 +1141,19 @@ static const struct io_op_def io_op_defs[] = {
.needs_file = 1
},
[IORING_OP_GETXATTR] = {},
+ [IORING_OP_SENDTO] = {
+ .needs_file = 1,
+ .unbound_nonreg_file = 1,
+ .pollout = 1,
+ .audit_skip = 1,
+ },
+ [IORING_OP_RECVFROM] = {
+ .needs_file = 1,
+ .unbound_nonreg_file = 1,
+ .pollin = 1,
+ .buffer_select = 1,
+ .audit_skip = 1,
+ },
};
/* requests with any of those set should undergo io_disarm_next() */
@@ -5216,12 +5237,24 @@ static int io_sendmsg_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
return -EINVAL;
+ /*
+ * For IORING_OP_SEND{,TO}, the assignment to @sr->umsg
+ * is equivalent to an assignment to @sr->buf.
+ */
sr->umsg = u64_to_user_ptr(READ_ONCE(sqe->addr));
+
sr->len = READ_ONCE(sqe->len);
sr->msg_flags = READ_ONCE(sqe->msg_flags) | MSG_NOSIGNAL;
if (sr->msg_flags & MSG_DONTWAIT)
req->flags |= REQ_F_NOWAIT;
+ if (req->opcode == IORING_OP_SENDTO) {
+ sr->addr = u64_to_user_ptr(READ_ONCE(sqe->addr2));
+ sr->sendto_addr_len = READ_ONCE(sqe->addr3);
+ } else {
+ sr->addr = (struct sockaddr __user *) NULL;
+ }
+
#ifdef CONFIG_COMPAT
if (req->ctx->compat)
sr->msg_flags |= MSG_CMSG_COMPAT;
@@ -5275,6 +5308,7 @@ static int io_sendmsg(struct io_kiocb *req, unsigned int issue_flags)
static int io_sendto(struct io_kiocb *req, unsigned int issue_flags)
{
+ struct sockaddr_storage address;
struct io_sr_msg *sr = &req->sr_msg;
struct msghdr msg;
struct iovec iov;
@@ -5291,10 +5325,20 @@ static int io_sendto(struct io_kiocb *req, unsigned int issue_flags)
if (unlikely(ret))
return ret;
- msg.msg_name = NULL;
+
msg.msg_control = NULL;
msg.msg_controllen = 0;
- msg.msg_namelen = 0;
+ if (sr->addr) {
+ ret = move_addr_to_kernel(sr->addr, sr->sendto_addr_len,
+ &address);
+ if (unlikely(ret < 0))
+ goto fail;
+ msg.msg_name = (struct sockaddr *) &address;
+ msg.msg_namelen = sr->sendto_addr_len;
+ } else {
+ msg.msg_name = NULL;
+ msg.msg_namelen = 0;
+ }
flags = req->sr_msg.msg_flags;
if (issue_flags & IO_URING_F_NONBLOCK)
@@ -5309,6 +5353,7 @@ static int io_sendto(struct io_kiocb *req, unsigned int issue_flags)
return -EAGAIN;
if (ret == -ERESTARTSYS)
ret = -EINTR;
+ fail:
req_set_fail(req);
}
__io_req_complete(req, issue_flags, ret, 0);
@@ -5427,13 +5472,25 @@ static int io_recvmsg_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
return -EINVAL;
+ /*
+ * For IORING_OP_RECV{,FROM}, the assignment to @sr->umsg
+ * is equivalent to an assignment to @sr->buf.
+ */
sr->umsg = u64_to_user_ptr(READ_ONCE(sqe->addr));
+
sr->len = READ_ONCE(sqe->len);
sr->bgid = READ_ONCE(sqe->buf_group);
sr->msg_flags = READ_ONCE(sqe->msg_flags) | MSG_NOSIGNAL;
if (sr->msg_flags & MSG_DONTWAIT)
req->flags |= REQ_F_NOWAIT;
+ if (req->opcode == IORING_OP_RECVFROM) {
+ sr->addr = u64_to_user_ptr(READ_ONCE(sqe->addr2));
+ sr->recvfrom_addr_len = u64_to_user_ptr(READ_ONCE(sqe->addr3));
+ } else {
+ sr->addr = (struct sockaddr __user *) NULL;
+ }
+
#ifdef CONFIG_COMPAT
if (req->ctx->compat)
sr->msg_flags |= MSG_CMSG_COMPAT;
@@ -5509,6 +5566,7 @@ static int io_recvfrom(struct io_kiocb *req, unsigned int issue_flags)
struct iovec iov;
unsigned flags;
int ret, min_ret = 0;
+ struct sockaddr_storage address;
bool force_nonblock = issue_flags & IO_URING_F_NONBLOCK;
sock = sock_from_file(req->file);
@@ -5526,7 +5584,7 @@ static int io_recvfrom(struct io_kiocb *req, unsigned int issue_flags)
if (unlikely(ret))
goto out_free;
- msg.msg_name = NULL;
+ msg.msg_name = sr->addr ? (struct sockaddr *) &address : NULL;
msg.msg_control = NULL;
msg.msg_controllen = 0;
msg.msg_namelen = 0;
@@ -5540,6 +5598,16 @@ static int io_recvfrom(struct io_kiocb *req, unsigned int issue_flags)
min_ret = iov_iter_count(&msg.msg_iter);
ret = sock_recvmsg(sock, &msg, flags);
+
+ if (ret >= 0 && sr->addr != NULL) {
+ int tmp;
+
+ tmp = move_addr_to_user(&address, msg.msg_namelen, sr->addr,
+ sr->recvfrom_addr_len);
+ if (unlikely(tmp < 0))
+ ret = tmp;
+ }
+
out_free:
if (ret < min_ret) {
if (ret == -EAGAIN && force_nonblock)
@@ -6778,9 +6846,11 @@ static int io_req_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
case IORING_OP_SYNC_FILE_RANGE:
return io_sfr_prep(req, sqe);
case IORING_OP_SENDMSG:
+ case IORING_OP_SENDTO:
case IORING_OP_SEND:
return io_sendmsg_prep(req, sqe);
case IORING_OP_RECVMSG:
+ case IORING_OP_RECVFROM:
case IORING_OP_RECV:
return io_recvmsg_prep(req, sqe);
case IORING_OP_CONNECT:
@@ -7060,12 +7130,14 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags)
case IORING_OP_SENDMSG:
ret = io_sendmsg(req, issue_flags);
break;
+ case IORING_OP_SENDTO:
case IORING_OP_SEND:
ret = io_sendto(req, issue_flags);
break;
case IORING_OP_RECVMSG:
ret = io_recvmsg(req, issue_flags);
break;
+ case IORING_OP_RECVFROM:
case IORING_OP_RECV:
ret = io_recvfrom(req, issue_flags);
break;
diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index efc7ac9b3a6b..a360069d1e8e 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -150,6 +150,8 @@ enum {
IORING_OP_SETXATTR,
IORING_OP_FGETXATTR,
IORING_OP_GETXATTR,
+ IORING_OP_SENDTO,
+ IORING_OP_RECVFROM,
/* this goes last, obviously */
IORING_OP_LAST,
--
2.32.0
On 30-12-2021 23:22, Ammar Faizi wrote:
> This adds sendto(2) and recvfrom(2) support for io_uring.
>
> New opcodes:
> IORING_OP_SENDTO
> IORING_OP_RECVFROM
>
> Cc: Nugra <[email protected]>
> Tested-by: Nugra <[email protected]>
> Link: https://github.com/axboe/liburing/issues/397
> Signed-off-by: Ammar Faizi <[email protected]>
> ---
>
> v3:
> - Fix build error when CONFIG_NET is undefined should be done in
> the first patch, not this patch.
>
> - Add Tested-by tag from Nugra.
>
> v2:
> - In `io_recvfrom()`, mark the error check of `move_addr_to_user()`
> call as unlikely.
>
> - Fix build error when CONFIG_NET is undefined.
>
> - Added Nugra to CC list (tester).
> ---
> fs/io_uring.c | 80 +++++++++++++++++++++++++++++++++--
> include/uapi/linux/io_uring.h | 2 +
> 2 files changed, 78 insertions(+), 4 deletions(-)
>
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index 7adcb591398f..3726958f8f58 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -575,7 +575,15 @@ struct io_sr_msg {
> union {
> struct compat_msghdr __user *umsg_compat;
> struct user_msghdr __user *umsg;
> - void __user *buf;
> +
> + struct {
> + void __user *buf;
> + struct sockaddr __user *addr;
> + union {
> + int sendto_addr_len;
> + int __user *recvfrom_addr_len;
> + };
> + };
> };
> int msg_flags;
> int bgid;
> @@ -1133,6 +1141,19 @@ static const struct io_op_def io_op_defs[] = {
> .needs_file = 1
> },
> [IORING_OP_GETXATTR] = {},
> + [IORING_OP_SENDTO] = {
> + .needs_file = 1,
> + .unbound_nonreg_file = 1,
> + .pollout = 1,
> + .audit_skip = 1,
> + },
> + [IORING_OP_RECVFROM] = {
> + .needs_file = 1,
> + .unbound_nonreg_file = 1,
> + .pollin = 1,
> + .buffer_select = 1,
> + .audit_skip = 1,
> + },
> };
>
> /* requests with any of those set should undergo io_disarm_next() */
> @@ -5216,12 +5237,24 @@ static int io_sendmsg_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
> if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
> return -EINVAL;
>
> + /*
> + * For IORING_OP_SEND{,TO}, the assignment to @sr->umsg
> + * is equivalent to an assignment to @sr->buf.
> + */
> sr->umsg = u64_to_user_ptr(READ_ONCE(sqe->addr));
> +
> sr->len = READ_ONCE(sqe->len);
> sr->msg_flags = READ_ONCE(sqe->msg_flags) | MSG_NOSIGNAL;
> if (sr->msg_flags & MSG_DONTWAIT)
> req->flags |= REQ_F_NOWAIT;
>
> + if (req->opcode == IORING_OP_SENDTO) {
> + sr->addr = u64_to_user_ptr(READ_ONCE(sqe->addr2));
> + sr->sendto_addr_len = READ_ONCE(sqe->addr3);
> + } else {
> + sr->addr = (struct sockaddr __user *) NULL;
Let's have sendto_addr_len = 0
> + }
> +
> #ifdef CONFIG_COMPAT
> if (req->ctx->compat)
> sr->msg_flags |= MSG_CMSG_COMPAT;
> @@ -5275,6 +5308,7 @@ static int io_sendmsg(struct io_kiocb *req, unsigned int issue_flags)
>
> static int io_sendto(struct io_kiocb *req, unsigned int issue_flags)
> {
> + struct sockaddr_storage address;
> struct io_sr_msg *sr = &req->sr_msg;
> struct msghdr msg;
> struct iovec iov;
> @@ -5291,10 +5325,20 @@ static int io_sendto(struct io_kiocb *req, unsigned int issue_flags)
> if (unlikely(ret))
> return ret;
>
> - msg.msg_name = NULL;
> +
> msg.msg_control = NULL;
> msg.msg_controllen = 0;
> - msg.msg_namelen = 0;
> + if (sr->addr) {
> + ret = move_addr_to_kernel(sr->addr, sr->sendto_addr_len,
> + &address);
> + if (unlikely(ret < 0))
> + goto fail;
> + msg.msg_name = (struct sockaddr *) &address;
> + msg.msg_namelen = sr->sendto_addr_len;
> + } else {
> + msg.msg_name = NULL;
> + msg.msg_namelen = 0;
> + }
>
> flags = req->sr_msg.msg_flags;
> if (issue_flags & IO_URING_F_NONBLOCK)
> @@ -5309,6 +5353,7 @@ static int io_sendto(struct io_kiocb *req, unsigned int issue_flags)
> return -EAGAIN;
> if (ret == -ERESTARTSYS)
> ret = -EINTR;
> + fail:
> req_set_fail(req);
I think there is a problem with "fail" goto statement. Not getting full clarity on this change. With latest kernel, I see req_set_fail(req) inside if check, which I don't see here. Can you please resend the patch on latest kernel version. Thanks.
> }
> __io_req_complete(req, issue_flags, ret, 0);
> @@ -5427,13 +5472,25 @@ static int io_recvmsg_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
> if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
> return -EINVAL;
>
> + /*
> + * For IORING_OP_RECV{,FROM}, the assignment to @sr->umsg
> + * is equivalent to an assignment to @sr->buf.
> + */
> sr->umsg = u64_to_user_ptr(READ_ONCE(sqe->addr));
> +
> sr->len = READ_ONCE(sqe->len);
> sr->bgid = READ_ONCE(sqe->buf_group);
> sr->msg_flags = READ_ONCE(sqe->msg_flags) | MSG_NOSIGNAL;
> if (sr->msg_flags & MSG_DONTWAIT)
> req->flags |= REQ_F_NOWAIT;
>
> + if (req->opcode == IORING_OP_RECVFROM) {
> + sr->addr = u64_to_user_ptr(READ_ONCE(sqe->addr2));
> + sr->recvfrom_addr_len = u64_to_user_ptr(READ_ONCE(sqe->addr3));
> + } else {
> + sr->addr = (struct sockaddr __user *) NULL;
I think recvfrom_addr_len should also be pointed to NULL, instead of garbage for this case.
> + }
> +
> #ifdef CONFIG_COMPAT
> if (req->ctx->compat)
> sr->msg_flags |= MSG_CMSG_COMPAT;
> @@ -5509,6 +5566,7 @@ static int io_recvfrom(struct io_kiocb *req, unsigned int issue_flags)
> struct iovec iov;
> unsigned flags;
> int ret, min_ret = 0;
> + struct sockaddr_storage address;
> bool force_nonblock = issue_flags & IO_URING_F_NONBLOCK;
>
> sock = sock_from_file(req->file);
> @@ -5526,7 +5584,7 @@ static int io_recvfrom(struct io_kiocb *req, unsigned int issue_flags)
> if (unlikely(ret))
> goto out_free;
>
> - msg.msg_name = NULL;
> + msg.msg_name = sr->addr ? (struct sockaddr *) &address : NULL;
> msg.msg_control = NULL;
> msg.msg_controllen = 0;
> msg.msg_namelen = 0;
I think namelen should also be updated ?
> @@ -5540,6 +5598,16 @@ static int io_recvfrom(struct io_kiocb *req, unsigned int issue_flags)
> min_ret = iov_iter_count(&msg.msg_iter);
>
> ret = sock_recvmsg(sock, &msg, flags);
> +
> + if (ret >= 0 && sr->addr != NULL) {
> + int tmp;
> +
> + tmp = move_addr_to_user(&address, msg.msg_namelen, sr->addr,
> + sr->recvfrom_addr_len);
> + if (unlikely(tmp < 0))
> + ret = tmp;
> + }
> +
> out_free:
> if (ret < min_ret) {
> if (ret == -EAGAIN && force_nonblock)
> @@ -6778,9 +6846,11 @@ static int io_req_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
> case IORING_OP_SYNC_FILE_RANGE:
> return io_sfr_prep(req, sqe);
> case IORING_OP_SENDMSG:
> + case IORING_OP_SENDTO:
> case IORING_OP_SEND:
> return io_sendmsg_prep(req, sqe);
> case IORING_OP_RECVMSG:
> + case IORING_OP_RECVFROM:
> case IORING_OP_RECV:
> return io_recvmsg_prep(req, sqe);
> case IORING_OP_CONNECT:
> @@ -7060,12 +7130,14 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags)
> case IORING_OP_SENDMSG:
> ret = io_sendmsg(req, issue_flags);
> break;
> + case IORING_OP_SENDTO:
> case IORING_OP_SEND:
> ret = io_sendto(req, issue_flags);
> break;
> case IORING_OP_RECVMSG:
> ret = io_recvmsg(req, issue_flags);
> break;
> + case IORING_OP_RECVFROM:
> case IORING_OP_RECV:
> ret = io_recvfrom(req, issue_flags);
> break;
> diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
> index efc7ac9b3a6b..a360069d1e8e 100644
> --- a/include/uapi/linux/io_uring.h
> +++ b/include/uapi/linux/io_uring.h
> @@ -150,6 +150,8 @@ enum {
> IORING_OP_SETXATTR,
> IORING_OP_FGETXATTR,
> IORING_OP_GETXATTR,
> + IORING_OP_SENDTO,
> + IORING_OP_RECVFROM,
>
> /* this goes last, obviously */
> IORING_OP_LAST,
Regards,
~Praveen.
On Thu, 6 Jan 2022 at 23:01:59 +0530, Praveen Kumar <[email protected]> wrote:
> On 30-12-2021 23:22, Ammar Faizi wrote:
>> This adds sendto(2) and recvfrom(2) support for io_uring.
>>
>> New opcodes:
>> IORING_OP_SENDTO
>> IORING_OP_RECVFROM
>>
>> Cc: Nugra <[email protected]>
>> Tested-by: Nugra <[email protected]>
>> Link: https://github.com/axboe/liburing/issues/397
>> Signed-off-by: Ammar Faizi <[email protected]>
>> ---
>>
>> v3:
>> - Fix build error when CONFIG_NET is undefined should be done in
>> the first patch, not this patch.
>>
>> - Add Tested-by tag from Nugra.
>>
>> v2:
>> - In `io_recvfrom()`, mark the error check of `move_addr_to_user()`
>> call as unlikely.
>>
>> - Fix build error when CONFIG_NET is undefined.
>>
>> - Added Nugra to CC list (tester).
>> ---
>> fs/io_uring.c | 80 +++++++++++++++++++++++++++++++++--
>> include/uapi/linux/io_uring.h | 2 +
>> 2 files changed, 78 insertions(+), 4 deletions(-)
>>
>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>> index 7adcb591398f..3726958f8f58 100644
>> --- a/fs/io_uring.c
>> +++ b/fs/io_uring.c
>> @@ -575,7 +575,15 @@ struct io_sr_msg {
>> union {
>> struct compat_msghdr __user *umsg_compat;
>> struct user_msghdr __user *umsg;
>> - void __user *buf;
>> +
>> + struct {
>> + void __user *buf;
>> + struct sockaddr __user *addr;
>> + union {
>> + int sendto_addr_len;
>> + int __user *recvfrom_addr_len;
>> + };
>> + };
>> };
>> int msg_flags;
>> int bgid;
>> @@ -1133,6 +1141,19 @@ static const struct io_op_def io_op_defs[] = {
>> .needs_file = 1
>> },
>> [IORING_OP_GETXATTR] = {},
>> + [IORING_OP_SENDTO] = {
>> + .needs_file = 1,
>> + .unbound_nonreg_file = 1,
>> + .pollout = 1,
>> + .audit_skip = 1,
>> + },
>> + [IORING_OP_RECVFROM] = {
>> + .needs_file = 1,
>> + .unbound_nonreg_file = 1,
>> + .pollin = 1,
>> + .buffer_select = 1,
>> + .audit_skip = 1,
>> + },
>> };
>>
>> /* requests with any of those set should undergo io_disarm_next() */
>> @@ -5216,12 +5237,24 @@ static int io_sendmsg_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
>> if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
>> return -EINVAL;
>>
>> + /*
>> + * For IORING_OP_SEND{,TO}, the assignment to @sr->umsg
>> + * is equivalent to an assignment to @sr->buf.
>> + */
>> sr->umsg = u64_to_user_ptr(READ_ONCE(sqe->addr));
>> +
>> sr->len = READ_ONCE(sqe->len);
>> sr->msg_flags = READ_ONCE(sqe->msg_flags) | MSG_NOSIGNAL;
>> if (sr->msg_flags & MSG_DONTWAIT)
>> req->flags |= REQ_F_NOWAIT;
>>
>> + if (req->opcode == IORING_OP_SENDTO) {
>> + sr->addr = u64_to_user_ptr(READ_ONCE(sqe->addr2));
>> + sr->sendto_addr_len = READ_ONCE(sqe->addr3);
>> + } else {
>> + sr->addr = (struct sockaddr __user *) NULL;
>
> Let's have sendto_addr_len = 0
Will do in the RFC v5.
>
>> + }
>> +
>> #ifdef CONFIG_COMPAT
>> if (req->ctx->compat)
>> sr->msg_flags |= MSG_CMSG_COMPAT;
>> @@ -5275,6 +5308,7 @@ static int io_sendmsg(struct io_kiocb *req, unsigned int issue_flags)
>>
>> static int io_sendto(struct io_kiocb *req, unsigned int issue_flags)
>> {
>> + struct sockaddr_storage address;
>> struct io_sr_msg *sr = &req->sr_msg;
>> struct msghdr msg;
>> struct iovec iov;
>> @@ -5291,10 +5325,20 @@ static int io_sendto(struct io_kiocb *req, unsigned int issue_flags)
>> if (unlikely(ret))
>> return ret;
>>
>> - msg.msg_name = NULL;
>> +
>> msg.msg_control = NULL;
>> msg.msg_controllen = 0;
>> - msg.msg_namelen = 0;
>> + if (sr->addr) {
>> + ret = move_addr_to_kernel(sr->addr, sr->sendto_addr_len,
>> + &address);
>> + if (unlikely(ret < 0))
>> + goto fail;
>> + msg.msg_name = (struct sockaddr *) &address;
>> + msg.msg_namelen = sr->sendto_addr_len;
>> + } else {
>> + msg.msg_name = NULL;
>> + msg.msg_namelen = 0;
>> + }
>>
>> flags = req->sr_msg.msg_flags;
>> if (issue_flags & IO_URING_F_NONBLOCK)
>> @@ -5309,6 +5353,7 @@ static int io_sendto(struct io_kiocb *req, unsigned int issue_flags)
>> return -EAGAIN;
>> if (ret == -ERESTARTSYS)
>> ret = -EINTR;
>> + fail:
>> req_set_fail(req);
>
> I think there is a problem with "fail" goto statement. Not getting
> full clarity on this change. With latest kernel, I see
> req_set_fail(req) inside if check, which I don't see here. Can you
> please resend the patch on latest kernel version. Thanks.
I will send the v5 on top of "for-next" branch in Jens' tree soon.
That is already inside an "if check" anyway. We go to that label when
the move_addr_to_kernel() fails (most of the time it is -EFAULT or
-EINVAL).
That part looks like this (note the if check before the goto):
----------------------------------------------------------------------
msg.msg_control = NULL;
msg.msg_controllen = 0;
if (sr->addr) {
ret = move_addr_to_kernel(sr->addr, sr->sendto_addr_len,
&address);
if (unlikely(ret < 0))
goto fail;
msg.msg_name = (struct sockaddr *) &address;
msg.msg_namelen = sr->sendto_addr_len;
} else {
msg.msg_name = NULL;
msg.msg_namelen = 0;
}
flags = req->sr_msg.msg_flags;
if (issue_flags & IO_URING_F_NONBLOCK)
flags |= MSG_DONTWAIT;
if (flags & MSG_WAITALL)
min_ret = iov_iter_count(&msg.msg_iter);
msg.msg_flags = flags;
ret = sock_sendmsg(sock, &msg);
if (ret < min_ret) {
if (ret == -EAGAIN && (issue_flags & IO_URING_F_NONBLOCK))
return -EAGAIN;
if (ret == -ERESTARTSYS)
ret = -EINTR;
fail:
req_set_fail(req);
}
__io_req_complete(req, issue_flags, ret, 0);
return 0;
----------------------------------------------------------------------
>> }
>> __io_req_complete(req, issue_flags, ret, 0);
>> @@ -5427,13 +5472,25 @@ static int io_recvmsg_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
>> if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
>> return -EINVAL;
>>
>> + /*
>> + * For IORING_OP_RECV{,FROM}, the assignment to @sr->umsg
>> + * is equivalent to an assignment to @sr->buf.
>> + */
>> sr->umsg = u64_to_user_ptr(READ_ONCE(sqe->addr));
>> +
>> sr->len = READ_ONCE(sqe->len);
>> sr->bgid = READ_ONCE(sqe->buf_group);
>> sr->msg_flags = READ_ONCE(sqe->msg_flags) | MSG_NOSIGNAL;
>> if (sr->msg_flags & MSG_DONTWAIT)
>> req->flags |= REQ_F_NOWAIT;
>>
>> + if (req->opcode == IORING_OP_RECVFROM) {
>> + sr->addr = u64_to_user_ptr(READ_ONCE(sqe->addr2));
>> + sr->recvfrom_addr_len = u64_to_user_ptr(READ_ONCE(sqe->addr3));
>> + } else {
>> + sr->addr = (struct sockaddr __user *) NULL;
>
> I think recvfrom_addr_len should also be pointed to NULL, instead of
> garbage for this case.
Will do in the RFC v5.
>
>> + }
>> +
>> #ifdef CONFIG_COMPAT
>> if (req->ctx->compat)
>> sr->msg_flags |= MSG_CMSG_COMPAT;
>> @@ -5509,6 +5566,7 @@ static int io_recvfrom(struct io_kiocb *req, unsigned int issue_flags)
>> struct iovec iov;
>> unsigned flags;
>> int ret, min_ret = 0;
>> + struct sockaddr_storage address;
>> bool force_nonblock = issue_flags & IO_URING_F_NONBLOCK;
>>
>> sock = sock_from_file(req->file);
>> @@ -5526,7 +5584,7 @@ static int io_recvfrom(struct io_kiocb *req, unsigned int issue_flags)
>> if (unlikely(ret))
>> goto out_free;
>>
>> - msg.msg_name = NULL;
>> + msg.msg_name = sr->addr ? (struct sockaddr *) &address : NULL;
>> msg.msg_control = NULL;
>> msg.msg_controllen = 0;
>> msg.msg_namelen = 0;
>
> I think namelen should also be updated ?
It doesn't have to be updated. From net/socket.c there is a comment
like this:
/* We assume all kernel code knows the size of sockaddr_storage */
msg.msg_namelen = 0;
Full __sys_recvfrom() source code, see here:
https://github.com/torvalds/linux/blob/v5.16-rc8/net/socket.c#L2085-L2088
I will add the same comment in next series to clarify this one.
>
>> @@ -5540,6 +5598,16 @@ static int io_recvfrom(struct io_kiocb *req, unsigned int issue_flags)
>> min_ret = iov_iter_count(&msg.msg_iter);
>>
>> ret = sock_recvmsg(sock, &msg, flags);
>> +
>> + if (ret >= 0 && sr->addr != NULL) {
>> + int tmp;
>> +
>> + tmp = move_addr_to_user(&address, msg.msg_namelen, sr->addr,
>> + sr->recvfrom_addr_len);
>> + if (unlikely(tmp < 0))
>> + ret = tmp;
>> + }
>> +
>> out_free:
>> if (ret < min_ret) {
>> if (ret == -EAGAIN && force_nonblock)
>> @@ -6778,9 +6846,11 @@ static int io_req_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
>> case IORING_OP_SYNC_FILE_RANGE:
>> return io_sfr_prep(req, sqe);
>> case IORING_OP_SENDMSG:
>> + case IORING_OP_SENDTO:
>> case IORING_OP_SEND:
>> return io_sendmsg_prep(req, sqe);
>> case IORING_OP_RECVMSG:
>> + case IORING_OP_RECVFROM:
>> case IORING_OP_RECV:
>> return io_recvmsg_prep(req, sqe);
>> case IORING_OP_CONNECT:
>> @@ -7060,12 +7130,14 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags)
>> case IORING_OP_SENDMSG:
>> ret = io_sendmsg(req, issue_flags);
>> break;
>> + case IORING_OP_SENDTO:
>> case IORING_OP_SEND:
>> ret = io_sendto(req, issue_flags);
>> break;
>> case IORING_OP_RECVMSG:
>> ret = io_recvmsg(req, issue_flags);
>> break;
>> + case IORING_OP_RECVFROM:
>> case IORING_OP_RECV:
>> ret = io_recvfrom(req, issue_flags);
>> break;
>> diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
>> index efc7ac9b3a6b..a360069d1e8e 100644
>> --- a/include/uapi/linux/io_uring.h
>> +++ b/include/uapi/linux/io_uring.h
>> @@ -150,6 +150,8 @@ enum {
>> IORING_OP_SETXATTR,
>> IORING_OP_FGETXATTR,
>> IORING_OP_GETXATTR,
>> + IORING_OP_SENDTO,
>> + IORING_OP_RECVFROM,
>>
>> /* this goes last, obviously */
>> IORING_OP_LAST,
>
>
> Regards,
>
> ~Praveen.
>
Thanks for the review. I will send the RFC v5 soon.
--
Ammar Faizi
On Fri, 7 Jan 2022 at 03:38:50 +0700, Ammar Faizi <[email protected]> wrote:
> On Thu, 6 Jan 2022 at 23:01:59 +0530, Praveen Kumar <[email protected]> wrote:
>> On 30-12-2021 23:22, Ammar Faizi wrote:
>>> This adds sendto(2) and recvfrom(2) support for io_uring.
>>>
>>> New opcodes:
>>> IORING_OP_SENDTO
>>> IORING_OP_RECVFROM
>>>
>>> Cc: Nugra <[email protected]>
>>> Tested-by: Nugra <[email protected]>
>>> Link: https://github.com/axboe/liburing/issues/397
>>> Signed-off-by: Ammar Faizi <[email protected]>
>>> ---
>>>
>>> v3:
>>> - Fix build error when CONFIG_NET is undefined should be done in
>>> the first patch, not this patch.
>>>
>>> - Add Tested-by tag from Nugra.
>>>
>>> v2:
>>> - In `io_recvfrom()`, mark the error check of `move_addr_to_user()`
>>> call as unlikely.
>>>
>>> - Fix build error when CONFIG_NET is undefined.
>>>
>>> - Added Nugra to CC list (tester).
>>> ---
>>> fs/io_uring.c | 80 +++++++++++++++++++++++++++++++++--
>>> include/uapi/linux/io_uring.h | 2 +
>>> 2 files changed, 78 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>>> index 7adcb591398f..3726958f8f58 100644
>>> --- a/fs/io_uring.c
>>> +++ b/fs/io_uring.c
>>> @@ -575,7 +575,15 @@ struct io_sr_msg {
>>> union {
>>> struct compat_msghdr __user *umsg_compat;
>>> struct user_msghdr __user *umsg;
>>> - void __user *buf;
>>> +
>>> + struct {
>>> + void __user *buf;
>>> + struct sockaddr __user *addr;
>>> + union {
>>> + int sendto_addr_len;
>>> + int __user *recvfrom_addr_len;
>>> + };
>>> + };
>>> };
>>> int msg_flags;
>>> int bgid;
>>> @@ -1133,6 +1141,19 @@ static const struct io_op_def io_op_defs[] = {
>>> .needs_file = 1
>>> },
>>> [IORING_OP_GETXATTR] = {},
>>> + [IORING_OP_SENDTO] = {
>>> + .needs_file = 1,
>>> + .unbound_nonreg_file = 1,
>>> + .pollout = 1,
>>> + .audit_skip = 1,
>>> + },
>>> + [IORING_OP_RECVFROM] = {
>>> + .needs_file = 1,
>>> + .unbound_nonreg_file = 1,
>>> + .pollin = 1,
>>> + .buffer_select = 1,
>>> + .audit_skip = 1,
>>> + },
>>> };
>>>
>>> /* requests with any of those set should undergo io_disarm_next() */
>>> @@ -5216,12 +5237,24 @@ static int io_sendmsg_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
>>> if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
>>> return -EINVAL;
>>>
>>> + /*
>>> + * For IORING_OP_SEND{,TO}, the assignment to @sr->umsg
>>> + * is equivalent to an assignment to @sr->buf.
>>> + */
>>> sr->umsg = u64_to_user_ptr(READ_ONCE(sqe->addr));
>>> +
>>> sr->len = READ_ONCE(sqe->len);
>>> sr->msg_flags = READ_ONCE(sqe->msg_flags) | MSG_NOSIGNAL;
>>> if (sr->msg_flags & MSG_DONTWAIT)
>>> req->flags |= REQ_F_NOWAIT;
>>>
>>> + if (req->opcode == IORING_OP_SENDTO) {
>>> + sr->addr = u64_to_user_ptr(READ_ONCE(sqe->addr2));
>>> + sr->sendto_addr_len = READ_ONCE(sqe->addr3);
>>> + } else {
>>> + sr->addr = (struct sockaddr __user *) NULL;
>>
>> Let's have sendto_addr_len = 0
>
> Will do in the RFC v5.
>
>>
>>> + }
>>> +
>>> #ifdef CONFIG_COMPAT
>>> if (req->ctx->compat)
>>> sr->msg_flags |= MSG_CMSG_COMPAT;
>>> @@ -5275,6 +5308,7 @@ static int io_sendmsg(struct io_kiocb *req, unsigned int issue_flags)
>>>
>>> static int io_sendto(struct io_kiocb *req, unsigned int issue_flags)
>>> {
>>> + struct sockaddr_storage address;
>>> struct io_sr_msg *sr = &req->sr_msg;
>>> struct msghdr msg;
>>> struct iovec iov;
>>> @@ -5291,10 +5325,20 @@ static int io_sendto(struct io_kiocb *req, unsigned int issue_flags)
>>> if (unlikely(ret))
>>> return ret;
>>>
>>> - msg.msg_name = NULL;
>>> +
>>> msg.msg_control = NULL;
>>> msg.msg_controllen = 0;
>>> - msg.msg_namelen = 0;
>>> + if (sr->addr) {
>>> + ret = move_addr_to_kernel(sr->addr, sr->sendto_addr_len,
>>> + &address);
>>> + if (unlikely(ret < 0))
>>> + goto fail;
>>> + msg.msg_name = (struct sockaddr *) &address;
>>> + msg.msg_namelen = sr->sendto_addr_len;
>>> + } else {
>>> + msg.msg_name = NULL;
>>> + msg.msg_namelen = 0;
>>> + }
>>>
>>> flags = req->sr_msg.msg_flags;
>>> if (issue_flags & IO_URING_F_NONBLOCK)
>>> @@ -5309,6 +5353,7 @@ static int io_sendto(struct io_kiocb *req, unsigned int issue_flags)
>>> return -EAGAIN;
>>> if (ret == -ERESTARTSYS)
>>> ret = -EINTR;
>>> + fail:
>>> req_set_fail(req);
>>
>> I think there is a problem with "fail" goto statement. Not getting
>> full clarity on this change. With latest kernel, I see
>> req_set_fail(req) inside if check, which I don't see here. Can you
>> please resend the patch on latest kernel version. Thanks.
>
> I will send the v5 on top of "for-next" branch in Jens' tree soon.
>
> That is already inside an "if check" anyway. We go to that label when
> the move_addr_to_kernel() fails (most of the time it is -EFAULT or
> -EINVAL).
>
> That part looks like this (note the if check before the goto):
> ----------------------------------------------------------------------
> msg.msg_control = NULL;
> msg.msg_controllen = 0;
> if (sr->addr) {
> ret = move_addr_to_kernel(sr->addr, sr->sendto_addr_len,
> &address);
> if (unlikely(ret < 0))
> goto fail;
> msg.msg_name = (struct sockaddr *) &address;
> msg.msg_namelen = sr->sendto_addr_len;
> } else {
> msg.msg_name = NULL;
> msg.msg_namelen = 0;
> }
>
> flags = req->sr_msg.msg_flags;
> if (issue_flags & IO_URING_F_NONBLOCK)
> flags |= MSG_DONTWAIT;
> if (flags & MSG_WAITALL)
> min_ret = iov_iter_count(&msg.msg_iter);
>
> msg.msg_flags = flags;
> ret = sock_sendmsg(sock, &msg);
> if (ret < min_ret) {
> if (ret == -EAGAIN && (issue_flags & IO_URING_F_NONBLOCK))
> return -EAGAIN;
> if (ret == -ERESTARTSYS)
> ret = -EINTR;
> fail:
> req_set_fail(req);
> }
> __io_req_complete(req, issue_flags, ret, 0);
> return 0;
> ----------------------------------------------------------------------
>
>>> }
>>> __io_req_complete(req, issue_flags, ret, 0);
>>> @@ -5427,13 +5472,25 @@ static int io_recvmsg_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
>>> if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
>>> return -EINVAL;
>>>
>>> + /*
>>> + * For IORING_OP_RECV{,FROM}, the assignment to @sr->umsg
>>> + * is equivalent to an assignment to @sr->buf.
>>> + */
>>> sr->umsg = u64_to_user_ptr(READ_ONCE(sqe->addr));
>>> +
>>> sr->len = READ_ONCE(sqe->len);
>>> sr->bgid = READ_ONCE(sqe->buf_group);
>>> sr->msg_flags = READ_ONCE(sqe->msg_flags) | MSG_NOSIGNAL;
>>> if (sr->msg_flags & MSG_DONTWAIT)
>>> req->flags |= REQ_F_NOWAIT;
>>>
>>> + if (req->opcode == IORING_OP_RECVFROM) {
>>> + sr->addr = u64_to_user_ptr(READ_ONCE(sqe->addr2));
>>> + sr->recvfrom_addr_len = u64_to_user_ptr(READ_ONCE(sqe->addr3));
>>> + } else {
>>> + sr->addr = (struct sockaddr __user *) NULL;
>>
>> I think recvfrom_addr_len should also be pointed to NULL, instead of
>> garbage for this case.
>
> Will do in the RFC v5.
>
>>
>>> + }
>>> +
>>> #ifdef CONFIG_COMPAT
>>> if (req->ctx->compat)
>>> sr->msg_flags |= MSG_CMSG_COMPAT;
>>> @@ -5509,6 +5566,7 @@ static int io_recvfrom(struct io_kiocb *req, unsigned int issue_flags)
>>> struct iovec iov;
>>> unsigned flags;
>>> int ret, min_ret = 0;
>>> + struct sockaddr_storage address;
>>> bool force_nonblock = issue_flags & IO_URING_F_NONBLOCK;
>>>
>>> sock = sock_from_file(req->file);
>>> @@ -5526,7 +5584,7 @@ static int io_recvfrom(struct io_kiocb *req, unsigned int issue_flags)
>>> if (unlikely(ret))
>>> goto out_free;
>>>
>>> - msg.msg_name = NULL;
>>> + msg.msg_name = sr->addr ? (struct sockaddr *) &address : NULL;
>>> msg.msg_control = NULL;
>>> msg.msg_controllen = 0;
>>> msg.msg_namelen = 0;
>>
>> I think namelen should also be updated ?
>
> It doesn't have to be updated. From net/socket.c there is a comment
> like this:
>
> /* We assume all kernel code knows the size of sockaddr_storage */
> msg.msg_namelen = 0;
>
> Full __sys_recvfrom() source code, see here:
> https://github.com/torvalds/linux/blob/v5.16-rc8/net/socket.c#L2085-L2088
>
> I will add the same comment in next series to clarify this one.
>
>>
>>> @@ -5540,6 +5598,16 @@ static int io_recvfrom(struct io_kiocb *req, unsigned int issue_flags)
>>> min_ret = iov_iter_count(&msg.msg_iter);
>>>
>>> ret = sock_recvmsg(sock, &msg, flags);
>>> +
>>> + if (ret >= 0 && sr->addr != NULL) {
>>> + int tmp;
>>> +
>>> + tmp = move_addr_to_user(&address, msg.msg_namelen, sr->addr,
>>> + sr->recvfrom_addr_len);
>>> + if (unlikely(tmp < 0))
>>> + ret = tmp;
>>> + }
>>> +
>>> out_free:
>>> if (ret < min_ret) {
>>> if (ret == -EAGAIN && force_nonblock)
>>> @@ -6778,9 +6846,11 @@ static int io_req_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
>>> case IORING_OP_SYNC_FILE_RANGE:
>>> return io_sfr_prep(req, sqe);
>>> case IORING_OP_SENDMSG:
>>> + case IORING_OP_SENDTO:
>>> case IORING_OP_SEND:
>>> return io_sendmsg_prep(req, sqe);
>>> case IORING_OP_RECVMSG:
>>> + case IORING_OP_RECVFROM:
>>> case IORING_OP_RECV:
>>> return io_recvmsg_prep(req, sqe);
>>> case IORING_OP_CONNECT:
>>> @@ -7060,12 +7130,14 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags)
>>> case IORING_OP_SENDMSG:
>>> ret = io_sendmsg(req, issue_flags);
>>> break;
>>> + case IORING_OP_SENDTO:
>>> case IORING_OP_SEND:
>>> ret = io_sendto(req, issue_flags);
>>> break;
>>> case IORING_OP_RECVMSG:
>>> ret = io_recvmsg(req, issue_flags);
>>> break;
>>> + case IORING_OP_RECVFROM:
>>> case IORING_OP_RECV:
>>> ret = io_recvfrom(req, issue_flags);
>>> break;
>>> diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
>>> index efc7ac9b3a6b..a360069d1e8e 100644
>>> --- a/include/uapi/linux/io_uring.h
>>> +++ b/include/uapi/linux/io_uring.h
>>> @@ -150,6 +150,8 @@ enum {
>>> IORING_OP_SETXATTR,
>>> IORING_OP_FGETXATTR,
>>> IORING_OP_GETXATTR,
>>> + IORING_OP_SENDTO,
>>> + IORING_OP_RECVFROM,
>>>
>>> /* this goes last, obviously */
>>> IORING_OP_LAST,
>>
>>
>> Regards,
>>
>> ~Praveen.
>>
>
> Thanks for the review. I will send the RFC v5 soon.
>
> --
> Ammar Faizi
s/v5/v4/g
--
Ammar Faizi
On 07-01-2022 02:08, Ammar Faizi wrote:
>
> On Thu, 6 Jan 2022 at 23:01:59 +0530, Praveen Kumar <[email protected]> wrote:
>> On 30-12-2021 23:22, Ammar Faizi wrote:
>>> This adds sendto(2) and recvfrom(2) support for io_uring.
>>>
>>> New opcodes:
>>> IORING_OP_SENDTO
>>> IORING_OP_RECVFROM
>>>
>>> Cc: Nugra <[email protected]>
>>> Tested-by: Nugra <[email protected]>
>>> Link: https://github.com/axboe/liburing/issues/397
>>> Signed-off-by: Ammar Faizi <[email protected]>
>>> ---
>>>
>>> v3:
>>> - Fix build error when CONFIG_NET is undefined should be done in
>>> the first patch, not this patch.
>>>
>>> - Add Tested-by tag from Nugra.
>>>
>>> v2:
>>> - In `io_recvfrom()`, mark the error check of `move_addr_to_user()`
>>> call as unlikely.
>>>
>>> - Fix build error when CONFIG_NET is undefined.
>>>
>>> - Added Nugra to CC list (tester).
>>> ---
>>> fs/io_uring.c | 80 +++++++++++++++++++++++++++++++++--
>>> include/uapi/linux/io_uring.h | 2 +
>>> 2 files changed, 78 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>>> index 7adcb591398f..3726958f8f58 100644
>>> --- a/fs/io_uring.c
>>> +++ b/fs/io_uring.c
>>> @@ -575,7 +575,15 @@ struct io_sr_msg {
>>> union {
>>> struct compat_msghdr __user *umsg_compat;
>>> struct user_msghdr __user *umsg;
>>> - void __user *buf;
>>> +
>>> + struct {
>>> + void __user *buf;
>>> + struct sockaddr __user *addr;
>>> + union {
>>> + int sendto_addr_len;
>>> + int __user *recvfrom_addr_len;
>>> + };
>>> + };
>>> };
>>> int msg_flags;
>>> int bgid;
>>> @@ -1133,6 +1141,19 @@ static const struct io_op_def io_op_defs[] = {
>>> .needs_file = 1
>>> },
>>> [IORING_OP_GETXATTR] = {},
>>> + [IORING_OP_SENDTO] = {
>>> + .needs_file = 1,
>>> + .unbound_nonreg_file = 1,
>>> + .pollout = 1,
>>> + .audit_skip = 1,
>>> + },
>>> + [IORING_OP_RECVFROM] = {
>>> + .needs_file = 1,
>>> + .unbound_nonreg_file = 1,
>>> + .pollin = 1,
>>> + .buffer_select = 1,
>>> + .audit_skip = 1,
>>> + },
>>> };
>>>
>>> /* requests with any of those set should undergo io_disarm_next() */
>>> @@ -5216,12 +5237,24 @@ static int io_sendmsg_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
>>> if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
>>> return -EINVAL;
>>>
>>> + /*
>>> + * For IORING_OP_SEND{,TO}, the assignment to @sr->umsg
>>> + * is equivalent to an assignment to @sr->buf.
>>> + */
>>> sr->umsg = u64_to_user_ptr(READ_ONCE(sqe->addr));
>>> +
>>> sr->len = READ_ONCE(sqe->len);
>>> sr->msg_flags = READ_ONCE(sqe->msg_flags) | MSG_NOSIGNAL;
>>> if (sr->msg_flags & MSG_DONTWAIT)
>>> req->flags |= REQ_F_NOWAIT;
>>>
>>> + if (req->opcode == IORING_OP_SENDTO) {
>>> + sr->addr = u64_to_user_ptr(READ_ONCE(sqe->addr2));
>>> + sr->sendto_addr_len = READ_ONCE(sqe->addr3);
>>> + } else {
>>> + sr->addr = (struct sockaddr __user *) NULL;
>>
>> Let's have sendto_addr_len = 0
>
> Will do in the RFC v5.
>
>>
>>> + }
>>> +
>>> #ifdef CONFIG_COMPAT
>>> if (req->ctx->compat)
>>> sr->msg_flags |= MSG_CMSG_COMPAT;
>>> @@ -5275,6 +5308,7 @@ static int io_sendmsg(struct io_kiocb *req, unsigned int issue_flags)
>>>
>>> static int io_sendto(struct io_kiocb *req, unsigned int issue_flags)
>>> {
>>> + struct sockaddr_storage address;
>>> struct io_sr_msg *sr = &req->sr_msg;
>>> struct msghdr msg;
>>> struct iovec iov;
>>> @@ -5291,10 +5325,20 @@ static int io_sendto(struct io_kiocb *req, unsigned int issue_flags)
>>> if (unlikely(ret))
>>> return ret;
>>>
>>> - msg.msg_name = NULL;
>>> +
>>> msg.msg_control = NULL;
>>> msg.msg_controllen = 0;
>>> - msg.msg_namelen = 0;
>>> + if (sr->addr) {
>>> + ret = move_addr_to_kernel(sr->addr, sr->sendto_addr_len,
>>> + &address);
>>> + if (unlikely(ret < 0))
>>> + goto fail;
>>> + msg.msg_name = (struct sockaddr *) &address;
>>> + msg.msg_namelen = sr->sendto_addr_len;
>>> + } else {
>>> + msg.msg_name = NULL;
>>> + msg.msg_namelen = 0;
>>> + }
>>>
>>> flags = req->sr_msg.msg_flags;
>>> if (issue_flags & IO_URING_F_NONBLOCK)
>>> @@ -5309,6 +5353,7 @@ static int io_sendto(struct io_kiocb *req, unsigned int issue_flags)
>>> return -EAGAIN;
>>> if (ret == -ERESTARTSYS)
>>> ret = -EINTR;
>>> + fail:
>>> req_set_fail(req);
>>
>> I think there is a problem with "fail" goto statement. Not getting
>> full clarity on this change. With latest kernel, I see
>> req_set_fail(req) inside if check, which I don't see here. Can you
>> please resend the patch on latest kernel version. Thanks.
>
> I will send the v5 on top of "for-next" branch in Jens' tree soon.
>
> That is already inside an "if check" anyway. We go to that label when
> the move_addr_to_kernel() fails (most of the time it is -EFAULT or
> -EINVAL).
>
> That part looks like this (note the if check before the goto):
> ----------------------------------------------------------------------
> msg.msg_control = NULL;
> msg.msg_controllen = 0;
> if (sr->addr) {
> ret = move_addr_to_kernel(sr->addr, sr->sendto_addr_len,
> &address);
> if (unlikely(ret < 0))
> goto fail;
> msg.msg_name = (struct sockaddr *) &address;
> msg.msg_namelen = sr->sendto_addr_len;
> } else {
> msg.msg_name = NULL;
> msg.msg_namelen = 0;
> }
>
> flags = req->sr_msg.msg_flags;
> if (issue_flags & IO_URING_F_NONBLOCK)
> flags |= MSG_DONTWAIT;
> if (flags & MSG_WAITALL)
> min_ret = iov_iter_count(&msg.msg_iter);
>
> msg.msg_flags = flags;
> ret = sock_sendmsg(sock, &msg);
> if (ret < min_ret) {
> if (ret == -EAGAIN && (issue_flags & IO_URING_F_NONBLOCK))
> return -EAGAIN;
> if (ret == -ERESTARTSYS)
> ret = -EINTR;
> fail:
Thanks for sending this. IMO this goto label should be just before the "if (ret < min_ret)" statement.
> req_set_fail(req);
> }
> __io_req_complete(req, issue_flags, ret, 0);
> return 0;
> ----------------------------------------------------------------------
>
>>> }
>>> __io_req_complete(req, issue_flags, ret, 0);
>>> @@ -5427,13 +5472,25 @@ static int io_recvmsg_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
>>> if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
>>> return -EINVAL;
>>>
>>> + /*
>>> + * For IORING_OP_RECV{,FROM}, the assignment to @sr->umsg
>>> + * is equivalent to an assignment to @sr->buf.
>>> + */
>>> sr->umsg = u64_to_user_ptr(READ_ONCE(sqe->addr));
>>> +
>>> sr->len = READ_ONCE(sqe->len);
>>> sr->bgid = READ_ONCE(sqe->buf_group);
>>> sr->msg_flags = READ_ONCE(sqe->msg_flags) | MSG_NOSIGNAL;
>>> if (sr->msg_flags & MSG_DONTWAIT)
>>> req->flags |= REQ_F_NOWAIT;
>>>
>>> + if (req->opcode == IORING_OP_RECVFROM) {
>>> + sr->addr = u64_to_user_ptr(READ_ONCE(sqe->addr2));
>>> + sr->recvfrom_addr_len = u64_to_user_ptr(READ_ONCE(sqe->addr3));
>>> + } else {
>>> + sr->addr = (struct sockaddr __user *) NULL;
>>
>> I think recvfrom_addr_len should also be pointed to NULL, instead of
>> garbage for this case.
>
> Will do in the RFC v5.
>
>>
>>> + }
>>> +
>>> #ifdef CONFIG_COMPAT
>>> if (req->ctx->compat)
>>> sr->msg_flags |= MSG_CMSG_COMPAT;
>>> @@ -5509,6 +5566,7 @@ static int io_recvfrom(struct io_kiocb *req, unsigned int issue_flags)
>>> struct iovec iov;
>>> unsigned flags;
>>> int ret, min_ret = 0;
>>> + struct sockaddr_storage address;
>>> bool force_nonblock = issue_flags & IO_URING_F_NONBLOCK;
>>>
>>> sock = sock_from_file(req->file);
>>> @@ -5526,7 +5584,7 @@ static int io_recvfrom(struct io_kiocb *req, unsigned int issue_flags)
>>> if (unlikely(ret))
>>> goto out_free;
>>>
>>> - msg.msg_name = NULL;
>>> + msg.msg_name = sr->addr ? (struct sockaddr *) &address : NULL;
>>> msg.msg_control = NULL;
>>> msg.msg_controllen = 0;
>>> msg.msg_namelen = 0;
>>
>> I think namelen should also be updated ?
>
> It doesn't have to be updated. From net/socket.c there is a comment
> like this:
>
> /* We assume all kernel code knows the size of sockaddr_storage */
> msg.msg_namelen = 0;
>
> Full __sys_recvfrom() source code, see here:
> https://github.com/torvalds/linux/blob/v5.16-rc8/net/socket.c#L2085-L2088
>
> I will add the same comment in next series to clarify this one.
>
>>
>>> @@ -5540,6 +5598,16 @@ static int io_recvfrom(struct io_kiocb *req, unsigned int issue_flags)
>>> min_ret = iov_iter_count(&msg.msg_iter);
>>>
>>> ret = sock_recvmsg(sock, &msg, flags);
>>> +
>>> + if (ret >= 0 && sr->addr != NULL) {
>>> + int tmp;
>>> +
>>> + tmp = move_addr_to_user(&address, msg.msg_namelen, sr->addr,
>>> + sr->recvfrom_addr_len);
>>> + if (unlikely(tmp < 0))
>>> + ret = tmp;
>>> + }
>>> +
>>> out_free:
>>> if (ret < min_ret) {
>>> if (ret == -EAGAIN && force_nonblock)
>>> @@ -6778,9 +6846,11 @@ static int io_req_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
>>> case IORING_OP_SYNC_FILE_RANGE:
>>> return io_sfr_prep(req, sqe);
>>> case IORING_OP_SENDMSG:
>>> + case IORING_OP_SENDTO:
>>> case IORING_OP_SEND:
>>> return io_sendmsg_prep(req, sqe);
>>> case IORING_OP_RECVMSG:
>>> + case IORING_OP_RECVFROM:
>>> case IORING_OP_RECV:
>>> return io_recvmsg_prep(req, sqe);
>>> case IORING_OP_CONNECT:
>>> @@ -7060,12 +7130,14 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags)
>>> case IORING_OP_SENDMSG:
>>> ret = io_sendmsg(req, issue_flags);
>>> break;
>>> + case IORING_OP_SENDTO:
>>> case IORING_OP_SEND:
>>> ret = io_sendto(req, issue_flags);
>>> break;
>>> case IORING_OP_RECVMSG:
>>> ret = io_recvmsg(req, issue_flags);
>>> break;
>>> + case IORING_OP_RECVFROM:
>>> case IORING_OP_RECV:
>>> ret = io_recvfrom(req, issue_flags);
>>> break;
>>> diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
>>> index efc7ac9b3a6b..a360069d1e8e 100644
>>> --- a/include/uapi/linux/io_uring.h
>>> +++ b/include/uapi/linux/io_uring.h
>>> @@ -150,6 +150,8 @@ enum {
>>> IORING_OP_SETXATTR,
>>> IORING_OP_FGETXATTR,
>>> IORING_OP_GETXATTR,
>>> + IORING_OP_SENDTO,
>>> + IORING_OP_RECVFROM,
>>>
>>> /* this goes last, obviously */
>>> IORING_OP_LAST,
>>
>>
>> Regards,
>>
>> ~Praveen.
>>
>
> Thanks for the review. I will send the RFC v5 soon.
>
On 1/7/22 3:33 PM, Praveen Kumar wrote:
> On 07-01-2022 02:08, Ammar Faizi wrote:
>> On Thu, 6 Jan 2022 at 23:01:59 +0530, Praveen Kumar <[email protected]> wrote:
>>> On 30-12-2021 23:22, Ammar Faizi wrote:
>>>> This adds sendto(2) and recvfrom(2) support for io_uring.
>>>>
>>>> New opcodes:
>>>> IORING_OP_SENDTO
>>>> IORING_OP_RECVFROM
>>>>
>>>> Cc: Nugra <[email protected]>
>>>> Tested-by: Nugra <[email protected]>
>>>> Link: https://github.com/axboe/liburing/issues/397
>>>> Signed-off-by: Ammar Faizi <[email protected]>
>>>> ---
>>>>
>>>> v3:
>>>> - Fix build error when CONFIG_NET is undefined should be done in
>>>> the first patch, not this patch.
>>>>
>>>> - Add Tested-by tag from Nugra.
>>>>
>>>> v2:
>>>> - In `io_recvfrom()`, mark the error check of `move_addr_to_user()`
>>>> call as unlikely.
>>>>
>>>> - Fix build error when CONFIG_NET is undefined.
>>>>
>>>> - Added Nugra to CC list (tester).
>>>> ---
>>>> fs/io_uring.c | 80 +++++++++++++++++++++++++++++++++--
>>>> include/uapi/linux/io_uring.h | 2 +
>>>> 2 files changed, 78 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>>>> index 7adcb591398f..3726958f8f58 100644
>>>> --- a/fs/io_uring.c
>>>> +++ b/fs/io_uring.c
>>>> @@ -575,7 +575,15 @@ struct io_sr_msg {
>>>> union {
>>>> struct compat_msghdr __user *umsg_compat;
>>>> struct user_msghdr __user *umsg;
>>>> - void __user *buf;
>>>> +
>>>> + struct {
>>>> + void __user *buf;
>>>> + struct sockaddr __user *addr;
>>>> + union {
>>>> + int sendto_addr_len;
>>>> + int __user *recvfrom_addr_len;
>>>> + };
>>>> + };
>>>> };
>>>> int msg_flags;
>>>> int bgid;
>>>> @@ -1133,6 +1141,19 @@ static const struct io_op_def io_op_defs[] = {
>>>> .needs_file = 1
>>>> },
>>>> [IORING_OP_GETXATTR] = {},
>>>> + [IORING_OP_SENDTO] = {
>>>> + .needs_file = 1,
>>>> + .unbound_nonreg_file = 1,
>>>> + .pollout = 1,
>>>> + .audit_skip = 1,
>>>> + },
>>>> + [IORING_OP_RECVFROM] = {
>>>> + .needs_file = 1,
>>>> + .unbound_nonreg_file = 1,
>>>> + .pollin = 1,
>>>> + .buffer_select = 1,
>>>> + .audit_skip = 1,
>>>> + },
>>>> };
>>>>
>>>> /* requests with any of those set should undergo io_disarm_next() */
>>>> @@ -5216,12 +5237,24 @@ static int io_sendmsg_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
>>>> if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
>>>> return -EINVAL;
>>>>
>>>> + /*
>>>> + * For IORING_OP_SEND{,TO}, the assignment to @sr->umsg
>>>> + * is equivalent to an assignment to @sr->buf.
>>>> + */
>>>> sr->umsg = u64_to_user_ptr(READ_ONCE(sqe->addr));
>>>> +
>>>> sr->len = READ_ONCE(sqe->len);
>>>> sr->msg_flags = READ_ONCE(sqe->msg_flags) | MSG_NOSIGNAL;
>>>> if (sr->msg_flags & MSG_DONTWAIT)
>>>> req->flags |= REQ_F_NOWAIT;
>>>>
>>>> + if (req->opcode == IORING_OP_SENDTO) {
>>>> + sr->addr = u64_to_user_ptr(READ_ONCE(sqe->addr2));
>>>> + sr->sendto_addr_len = READ_ONCE(sqe->addr3);
>>>> + } else {
>>>> + sr->addr = (struct sockaddr __user *) NULL;
>>>
>>> Let's have sendto_addr_len = 0
>>
>> Will do in the RFC v5.
>>
>>>
>>>> + }
>>>> +
>>>> #ifdef CONFIG_COMPAT
>>>> if (req->ctx->compat)
>>>> sr->msg_flags |= MSG_CMSG_COMPAT;
>>>> @@ -5275,6 +5308,7 @@ static int io_sendmsg(struct io_kiocb *req, unsigned int issue_flags)
>>>>
>>>> static int io_sendto(struct io_kiocb *req, unsigned int issue_flags)
>>>> {
>>>> + struct sockaddr_storage address;
>>>> struct io_sr_msg *sr = &req->sr_msg;
>>>> struct msghdr msg;
>>>> struct iovec iov;
>>>> @@ -5291,10 +5325,20 @@ static int io_sendto(struct io_kiocb *req, unsigned int issue_flags)
>>>> if (unlikely(ret))
>>>> return ret;
>>>>
>>>> - msg.msg_name = NULL;
>>>> +
>>>> msg.msg_control = NULL;
>>>> msg.msg_controllen = 0;
>>>> - msg.msg_namelen = 0;
>>>> + if (sr->addr) {
>>>> + ret = move_addr_to_kernel(sr->addr, sr->sendto_addr_len,
>>>> + &address);
>>>> + if (unlikely(ret < 0))
>>>> + goto fail;
>>>> + msg.msg_name = (struct sockaddr *) &address;
>>>> + msg.msg_namelen = sr->sendto_addr_len;
>>>> + } else {
>>>> + msg.msg_name = NULL;
>>>> + msg.msg_namelen = 0;
>>>> + }
>>>>
>>>> flags = req->sr_msg.msg_flags;
>>>> if (issue_flags & IO_URING_F_NONBLOCK)
>>>> @@ -5309,6 +5353,7 @@ static int io_sendto(struct io_kiocb *req, unsigned int issue_flags)
>>>> return -EAGAIN;
>>>> if (ret == -ERESTARTSYS)
>>>> ret = -EINTR;
>>>> + fail:
>>>> req_set_fail(req);
>>>
>>> I think there is a problem with "fail" goto statement. Not getting
>>> full clarity on this change. With latest kernel, I see
>>> req_set_fail(req) inside if check, which I don't see here. Can you
>>> please resend the patch on latest kernel version. Thanks.
>>
>> I will send the v5 on top of "for-next" branch in Jens' tree soon.
>>
>> That is already inside an "if check" anyway. We go to that label when
>> the move_addr_to_kernel() fails (most of the time it is -EFAULT or
>> -EINVAL).
>>
>> That part looks like this (note the if check before the goto):
>> ----------------------------------------------------------------------
>> msg.msg_control = NULL;
>> msg.msg_controllen = 0;
>> if (sr->addr) {
>> ret = move_addr_to_kernel(sr->addr, sr->sendto_addr_len,
>> &address);
>> if (unlikely(ret < 0))
>> goto fail;
>> msg.msg_name = (struct sockaddr *) &address;
>> msg.msg_namelen = sr->sendto_addr_len;
>> } else {
>> msg.msg_name = NULL;
>> msg.msg_namelen = 0;
>> }
>>
>> flags = req->sr_msg.msg_flags;
>> if (issue_flags & IO_URING_F_NONBLOCK)
>> flags |= MSG_DONTWAIT;
>> if (flags & MSG_WAITALL)
>> min_ret = iov_iter_count(&msg.msg_iter);
>>
>> msg.msg_flags = flags;
>> ret = sock_sendmsg(sock, &msg);
>> if (ret < min_ret) {
>> if (ret == -EAGAIN && (issue_flags & IO_URING_F_NONBLOCK))
>> return -EAGAIN;
>> if (ret == -ERESTARTSYS)
>> ret = -EINTR;
>> fail:
>
> Thanks for sending this. IMO this goto label should be just before
> the "if (ret < min_ret)" statement.
AFAICT, error returned by move_addr_to_kernel are only -EFAULT and -EINVAL,
so the check against -EAGAIN and -ERESTARTSYS is unnecessary for that
case. We can skip that.
RFC v4 here (rebased on top of "for-next" Jens' tree):
https://lore.kernel.org/io-uring/[email protected]/T/
>
>> req_set_fail(req);
>> }
>> __io_req_complete(req, issue_flags, ret, 0);
>> return 0;
>> ----------------------------------------------------------------------
>>
>>>> }
>>>> __io_req_complete(req, issue_flags, ret, 0);
>>>> @@ -5427,13 +5472,25 @@ static int io_recvmsg_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
>>>> if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
>>>> return -EINVAL;
>>>>
>>>> + /*
>>>> + * For IORING_OP_RECV{,FROM}, the assignment to @sr->umsg
>>>> + * is equivalent to an assignment to @sr->buf.
>>>> + */
>>>> sr->umsg = u64_to_user_ptr(READ_ONCE(sqe->addr));
>>>> +
>>>> sr->len = READ_ONCE(sqe->len);
>>>> sr->bgid = READ_ONCE(sqe->buf_group);
>>>> sr->msg_flags = READ_ONCE(sqe->msg_flags) | MSG_NOSIGNAL;
>>>> if (sr->msg_flags & MSG_DONTWAIT)
>>>> req->flags |= REQ_F_NOWAIT;
>>>>
>>>> + if (req->opcode == IORING_OP_RECVFROM) {
>>>> + sr->addr = u64_to_user_ptr(READ_ONCE(sqe->addr2));
>>>> + sr->recvfrom_addr_len = u64_to_user_ptr(READ_ONCE(sqe->addr3));
>>>> + } else {
>>>> + sr->addr = (struct sockaddr __user *) NULL;
>>>
>>> I think recvfrom_addr_len should also be pointed to NULL, instead of
>>> garbage for this case.
>>
>> Will do in the RFC v5.
>>
>>>
>>>> + }
>>>> +
>>>> #ifdef CONFIG_COMPAT
>>>> if (req->ctx->compat)
>>>> sr->msg_flags |= MSG_CMSG_COMPAT;
>>>> @@ -5509,6 +5566,7 @@ static int io_recvfrom(struct io_kiocb *req, unsigned int issue_flags)
>>>> struct iovec iov;
>>>> unsigned flags;
>>>> int ret, min_ret = 0;
>>>> + struct sockaddr_storage address;
>>>> bool force_nonblock = issue_flags & IO_URING_F_NONBLOCK;
>>>>
>>>> sock = sock_from_file(req->file);
>>>> @@ -5526,7 +5584,7 @@ static int io_recvfrom(struct io_kiocb *req, unsigned int issue_flags)
>>>> if (unlikely(ret))
>>>> goto out_free;
>>>>
>>>> - msg.msg_name = NULL;
>>>> + msg.msg_name = sr->addr ? (struct sockaddr *) &address : NULL;
>>>> msg.msg_control = NULL;
>>>> msg.msg_controllen = 0;
>>>> msg.msg_namelen = 0;
>>>
>>> I think namelen should also be updated ?
>>
>> It doesn't have to be updated. From net/socket.c there is a comment
>> like this:
>>
>> /* We assume all kernel code knows the size of sockaddr_storage */
>> msg.msg_namelen = 0;
>>
>> Full __sys_recvfrom() source code, see here:
>> https://github.com/torvalds/linux/blob/v5.16-rc8/net/socket.c#L2085-L2088
>>
>> I will add the same comment in next series to clarify this one.
>>
>>>
>>>> @@ -5540,6 +5598,16 @@ static int io_recvfrom(struct io_kiocb *req, unsigned int issue_flags)
>>>> min_ret = iov_iter_count(&msg.msg_iter);
>>>>
>>>> ret = sock_recvmsg(sock, &msg, flags);
>>>> +
>>>> + if (ret >= 0 && sr->addr != NULL) {
>>>> + int tmp;
>>>> +
>>>> + tmp = move_addr_to_user(&address, msg.msg_namelen, sr->addr,
>>>> + sr->recvfrom_addr_len);
>>>> + if (unlikely(tmp < 0))
>>>> + ret = tmp;
>>>> + }
>>>> +
>>>> out_free:
>>>> if (ret < min_ret) {
>>>> if (ret == -EAGAIN && force_nonblock)
>>>> @@ -6778,9 +6846,11 @@ static int io_req_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
>>>> case IORING_OP_SYNC_FILE_RANGE:
>>>> return io_sfr_prep(req, sqe);
>>>> case IORING_OP_SENDMSG:
>>>> + case IORING_OP_SENDTO:
>>>> case IORING_OP_SEND:
>>>> return io_sendmsg_prep(req, sqe);
>>>> case IORING_OP_RECVMSG:
>>>> + case IORING_OP_RECVFROM:
>>>> case IORING_OP_RECV:
>>>> return io_recvmsg_prep(req, sqe);
>>>> case IORING_OP_CONNECT:
>>>> @@ -7060,12 +7130,14 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags)
>>>> case IORING_OP_SENDMSG:
>>>> ret = io_sendmsg(req, issue_flags);
>>>> break;
>>>> + case IORING_OP_SENDTO:
>>>> case IORING_OP_SEND:
>>>> ret = io_sendto(req, issue_flags);
>>>> break;
>>>> case IORING_OP_RECVMSG:
>>>> ret = io_recvmsg(req, issue_flags);
>>>> break;
>>>> + case IORING_OP_RECVFROM:
>>>> case IORING_OP_RECV:
>>>> ret = io_recvfrom(req, issue_flags);
>>>> break;
>>>> diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
>>>> index efc7ac9b3a6b..a360069d1e8e 100644
>>>> --- a/include/uapi/linux/io_uring.h
>>>> +++ b/include/uapi/linux/io_uring.h
>>>> @@ -150,6 +150,8 @@ enum {
>>>> IORING_OP_SETXATTR,
>>>> IORING_OP_FGETXATTR,
>>>> IORING_OP_GETXATTR,
>>>> + IORING_OP_SENDTO,
>>>> + IORING_OP_RECVFROM,
>>>>
>>>> /* this goes last, obviously */
>>>> IORING_OP_LAST,
>>>
>>>
>>> Regards,
>>>
>>> ~Praveen.
>>>
>>
>> Thanks for the review. I will send the RFC v5 soon.
>>
>
--
Ammar Faizi