Hello,
This patchset includes some updates for SO_RCVLOWAT:
1) af_vsock:
During my experiments with zerocopy receive, i found, that in some
cases, poll() implementation violates POSIX: when socket has non-
default SO_RCVLOWAT(e.g. not 1), poll() will always set POLLIN and
POLLRDNORM bits in 'revents' even number of bytes available to read
on socket is smaller than SO_RCVLOWAT value. In this case,user sees
POLLIN flag and then tries to read data(for example using 'read()'
call), but read call will be blocked, because SO_RCVLOWAT logic is
supported in dequeue loop in af_vsock.c. But the same time, POSIX
requires that:
"POLLIN Data other than high-priority data may be read without
blocking.
POLLRDNORM Normal data may be read without blocking."
See https://www.open-std.org/jtc1/sc22/open/n4217.pdf, page 293.
So, we have, that poll() syscall returns POLLIN, but read call will
be blocked.
Also in man page socket(7) i found that:
"Since Linux 2.6.28, select(2), poll(2), and epoll(7) indicate a
socket as readable only if at least SO_RCVLOWAT bytes are available."
I checked TCP callback for poll()(net/ipv4/tcp.c, tcp_poll()), it
uses SO_RCVLOWAT value to set POLLIN bit, also i've tested TCP with
this case for TCP socket, it works as POSIX required.
I've added some fixes to af_vsock.c and virtio_transport_common.c,
test is also implemented.
2) virtio/vsock:
It adds some optimization to wake ups, when new data arrived. Now,
SO_RCVLOWAT is considered before wake up sleepers who wait new data.
There is no sense, to kick waiter, when number of available bytes
in socket's queue < SO_RCVLOWAT, because if we wake up reader in
this case, it will wait for SO_RCVLOWAT data anyway during dequeue,
or in poll() case, POLLIN/POLLRDNORM bits won't be set, so such
exit from poll() will be "spurious". This logic is also used in TCP
sockets.
3) vmci/vsock:
Same as 2), but i'm not sure about this changes. Will be very good,
to get comments from someone who knows this code.
4) Hyper-V:
As Dexuan Cui mentioned, for Hyper-V transport it is difficult to
support SO_RCVLOWAT, so he suggested to disable this feature for
Hyper-V.
Thank You
Arseniy Krasnov(9):
vsock: SO_RCVLOWAT transport set callback
hv_sock: disable SO_RCVLOWAT support
virtio/vsock: use 'target' in notify_poll_in callback
vmci/vsock: use 'target' in notify_poll_in callback
vsock: pass sock_rcvlowat to notify_poll_in as target
vsock: add API call for data ready
virtio/vsock: check SO_RCVLOWAT before wake up reader
vmci/vsock: check SO_RCVLOWAT before wake up reader
vsock_test: POLLIN + SO_RCVLOWAT test
include/net/af_vsock.h | 2 +
net/vmw_vsock/af_vsock.c | 33 +++++++-
net/vmw_vsock/hyperv_transport.c | 7 ++
net/vmw_vsock/virtio_transport_common.c | 7 +-
net/vmw_vsock/vmci_transport_notify.c | 10 +--
net/vmw_vsock/vmci_transport_notify_qstate.c | 12 +--
tools/testing/vsock/vsock_test.c | 108 +++++++++++++++++++++++++++
7 files changed, 162 insertions(+), 17 deletions(-)
Changelog:
v1 -> v2:
1) Patches for VMCI transport(same as for virtio-vsock).
2) Patches for Hyper-V transport(disabling SO_RCVLOWAT setting).
3) Waiting logic in test was updated(sleep() -> poll()).
v2 -> v3:
1) Patches were reordered.
2) Commit message updated in 0005.
3) Check 'transport' pointer in 0001 for NULL.
v3 -> v4:
1) vsock_set_rcvlowat() logic changed. Previous version required
assigned transport and always called its 'set_rcvlowat' callback
(if present). Now, assignment is not needed.
2) 0003,0004,0005,0006,0007,0008 - commit messages updated.
3) 0009 - small refactoring and style fixes.
4) RFC tag was removed.
--
2.25.1
This callback controls setting of POLLIN, POLLRDNORM output bits of poll()
syscall, but in some cases, it is incorrectly to set it, when socket has
at least 1 bytes of available data. Use 'target' which is already exists.
Signed-off-by: Arseniy Krasnov <[email protected]>
Reviewed-by: Stefano Garzarella <[email protected]>
---
net/vmw_vsock/virtio_transport_common.c | 5 +----
1 file changed, 1 insertion(+), 4 deletions(-)
diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index ec2c2afbf0d0..8f6356ebcdd1 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -634,10 +634,7 @@ virtio_transport_notify_poll_in(struct vsock_sock *vsk,
size_t target,
bool *data_ready_now)
{
- if (vsock_stream_has_data(vsk))
- *data_ready_now = true;
- else
- *data_ready_now = false;
+ *data_ready_now = vsock_stream_has_data(vsk) >= target;
return 0;
}
--
2.25.1
This adds transport specific callback for SO_RCVLOWAT, because in some
transports it may be difficult to know current available number of bytes
ready to read. Thus, when SO_RCVLOWAT is set, transport may reject it.
Signed-off-by: Arseniy Krasnov <[email protected]>
---
include/net/af_vsock.h | 1 +
net/vmw_vsock/af_vsock.c | 20 ++++++++++++++++++++
2 files changed, 21 insertions(+)
diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
index 1c53c4c4d88f..d609a088cb27 100644
--- a/include/net/af_vsock.h
+++ b/include/net/af_vsock.h
@@ -135,6 +135,7 @@ struct vsock_transport {
u64 (*stream_rcvhiwat)(struct vsock_sock *);
bool (*stream_is_active)(struct vsock_sock *);
bool (*stream_allow)(u32 cid, u32 port);
+ int (*set_rcvlowat)(struct vsock_sock *vsk, int val);
/* SEQ_PACKET. */
ssize_t (*seqpacket_dequeue)(struct vsock_sock *vsk, struct msghdr *msg,
diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index f04abf662ec6..0a6777526c73 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -2129,6 +2129,25 @@ vsock_connectible_recvmsg(struct socket *sock, struct msghdr *msg, size_t len,
return err;
}
+static int vsock_set_rcvlowat(struct sock *sk, int val)
+{
+ const struct vsock_transport *transport;
+ struct vsock_sock *vsk;
+
+ vsk = vsock_sk(sk);
+
+ if (val > vsk->buffer_size)
+ return -EINVAL;
+
+ transport = vsk->transport;
+
+ if (transport && transport->set_rcvlowat)
+ return transport->set_rcvlowat(vsk, val);
+
+ WRITE_ONCE(sk->sk_rcvlowat, val ? : 1);
+ return 0;
+}
+
static const struct proto_ops vsock_stream_ops = {
.family = PF_VSOCK,
.owner = THIS_MODULE,
@@ -2148,6 +2167,7 @@ static const struct proto_ops vsock_stream_ops = {
.recvmsg = vsock_connectible_recvmsg,
.mmap = sock_no_mmap,
.sendpage = sock_no_sendpage,
+ .set_rcvlowat = vsock_set_rcvlowat,
};
static const struct proto_ops vsock_seqpacket_ops = {
--
2.25.1
This callback controls setting of POLLIN, POLLRDNORM output bits of poll()
syscall, but in some cases, it is incorrectly to set it, when socket has
at least 1 bytes of available data. Use 'target' which is already exists.
Signed-off-by: Arseniy Krasnov <[email protected]>
Reviewed-by: Stefano Garzarella <[email protected]>
Reviewed-by: Vishnu Dasa <[email protected]>
---
net/vmw_vsock/vmci_transport_notify.c | 8 ++++----
net/vmw_vsock/vmci_transport_notify_qstate.c | 8 ++++----
2 files changed, 8 insertions(+), 8 deletions(-)
diff --git a/net/vmw_vsock/vmci_transport_notify.c b/net/vmw_vsock/vmci_transport_notify.c
index d69fc4b595ad..852097e2b9e6 100644
--- a/net/vmw_vsock/vmci_transport_notify.c
+++ b/net/vmw_vsock/vmci_transport_notify.c
@@ -340,12 +340,12 @@ vmci_transport_notify_pkt_poll_in(struct sock *sk,
{
struct vsock_sock *vsk = vsock_sk(sk);
- if (vsock_stream_has_data(vsk)) {
+ if (vsock_stream_has_data(vsk) >= target) {
*data_ready_now = true;
} else {
- /* We can't read right now because there is nothing in the
- * queue. Ask for notifications when there is something to
- * read.
+ /* We can't read right now because there is not enough data
+ * in the queue. Ask for notifications when there is something
+ * to read.
*/
if (sk->sk_state == TCP_ESTABLISHED) {
if (!send_waiting_read(sk, 1))
diff --git a/net/vmw_vsock/vmci_transport_notify_qstate.c b/net/vmw_vsock/vmci_transport_notify_qstate.c
index 0f36d7c45db3..12f0cb8fe998 100644
--- a/net/vmw_vsock/vmci_transport_notify_qstate.c
+++ b/net/vmw_vsock/vmci_transport_notify_qstate.c
@@ -161,12 +161,12 @@ vmci_transport_notify_pkt_poll_in(struct sock *sk,
{
struct vsock_sock *vsk = vsock_sk(sk);
- if (vsock_stream_has_data(vsk)) {
+ if (vsock_stream_has_data(vsk) >= target) {
*data_ready_now = true;
} else {
- /* We can't read right now because there is nothing in the
- * queue. Ask for notifications when there is something to
- * read.
+ /* We can't read right now because there is not enough data
+ * in the queue. Ask for notifications when there is something
+ * to read.
*/
if (sk->sk_state == TCP_ESTABLISHED)
vsock_block_update_write_window(sk);
--
2.25.1
For Hyper-V it is quiet difficult to support this socket option,due to
transport internals, so disable it.
Signed-off-by: Arseniy Krasnov <[email protected]>
Reviewed-by: Dexuan Cui <[email protected]>
Reviewed-by: Stefano Garzarella <[email protected]>
---
net/vmw_vsock/hyperv_transport.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/net/vmw_vsock/hyperv_transport.c b/net/vmw_vsock/hyperv_transport.c
index fd98229e3db3..59c3e2697069 100644
--- a/net/vmw_vsock/hyperv_transport.c
+++ b/net/vmw_vsock/hyperv_transport.c
@@ -815,6 +815,12 @@ int hvs_notify_send_post_enqueue(struct vsock_sock *vsk, ssize_t written,
return 0;
}
+static
+int hvs_set_rcvlowat(struct vsock_sock *vsk, int val)
+{
+ return -EOPNOTSUPP;
+}
+
static struct vsock_transport hvs_transport = {
.module = THIS_MODULE,
@@ -850,6 +856,7 @@ static struct vsock_transport hvs_transport = {
.notify_send_pre_enqueue = hvs_notify_send_pre_enqueue,
.notify_send_post_enqueue = hvs_notify_send_post_enqueue,
+ .set_rcvlowat = hvs_set_rcvlowat
};
static bool hvs_check_transport(struct vsock_sock *vsk)
--
2.25.1
This adds 'vsock_data_ready()' which must be called by transport to kick
sleeping data readers. It checks for SO_RCVLOWAT value before waking
user, thus preventing spurious wake ups. Based on 'tcp_data_ready()' logic.
Signed-off-by: Arseniy Krasnov <[email protected]>
Reviewed-by: Stefano Garzarella <[email protected]>
---
include/net/af_vsock.h | 1 +
net/vmw_vsock/af_vsock.c | 10 ++++++++++
2 files changed, 11 insertions(+)
diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
index d609a088cb27..568a87c5e0d0 100644
--- a/include/net/af_vsock.h
+++ b/include/net/af_vsock.h
@@ -78,6 +78,7 @@ struct vsock_sock {
s64 vsock_stream_has_data(struct vsock_sock *vsk);
s64 vsock_stream_has_space(struct vsock_sock *vsk);
struct sock *vsock_create_connected(struct sock *parent);
+void vsock_data_ready(struct sock *sk);
/**** TRANSPORT ****/
diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index b5cbc849844b..85a665253e84 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -882,6 +882,16 @@ s64 vsock_stream_has_space(struct vsock_sock *vsk)
}
EXPORT_SYMBOL_GPL(vsock_stream_has_space);
+void vsock_data_ready(struct sock *sk)
+{
+ struct vsock_sock *vsk = vsock_sk(sk);
+
+ if (vsock_stream_has_data(vsk) >= sk->sk_rcvlowat ||
+ sock_flag(sk, SOCK_DONE))
+ sk->sk_data_ready(sk);
+}
+EXPORT_SYMBOL_GPL(vsock_data_ready);
+
static int vsock_release(struct socket *sock)
{
__vsock_release(sock->sk, 0);
--
2.25.1
Passing 1 as the target to notify_poll_in(), we don't honor
what the user has set via SO_RCVLOWAT, going to set POLLIN
and POLLRDNORM, even if we don't have the amount of bytes
expected by the user.
Let's use sock_rcvlowat() to get the right target to pass to
notify_poll_in();
Signed-off-by: Arseniy Krasnov <[email protected]>
Reviewed-by: Stefano Garzarella <[email protected]>
---
net/vmw_vsock/af_vsock.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index 0a6777526c73..b5cbc849844b 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -1066,8 +1066,9 @@ static __poll_t vsock_poll(struct file *file, struct socket *sock,
if (transport && transport->stream_is_active(vsk) &&
!(sk->sk_shutdown & RCV_SHUTDOWN)) {
bool data_ready_now = false;
+ int target = sock_rcvlowat(sk, 0, INT_MAX);
int ret = transport->notify_poll_in(
- vsk, 1, &data_ready_now);
+ vsk, target, &data_ready_now);
if (ret < 0) {
mask |= EPOLLERR;
} else {
--
2.25.1
This adds extra condition to wake up data reader: do it only when number
of readable bytes >= SO_RCVLOWAT. Otherwise, there is no sense to kick
user, because it will wait until SO_RCVLOWAT bytes will be dequeued. This
check is performed in vsock_data_ready().
Signed-off-by: Arseniy Krasnov <[email protected]>
Reviewed-by: Vishnu Dasa <[email protected]>
---
net/vmw_vsock/vmci_transport_notify.c | 2 +-
net/vmw_vsock/vmci_transport_notify_qstate.c | 4 ++--
2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/net/vmw_vsock/vmci_transport_notify.c b/net/vmw_vsock/vmci_transport_notify.c
index 852097e2b9e6..7c3a7db134b2 100644
--- a/net/vmw_vsock/vmci_transport_notify.c
+++ b/net/vmw_vsock/vmci_transport_notify.c
@@ -307,7 +307,7 @@ vmci_transport_handle_wrote(struct sock *sk,
struct vsock_sock *vsk = vsock_sk(sk);
PKT_FIELD(vsk, sent_waiting_read) = false;
#endif
- sk->sk_data_ready(sk);
+ vsock_data_ready(sk);
}
static void vmci_transport_notify_pkt_socket_init(struct sock *sk)
diff --git a/net/vmw_vsock/vmci_transport_notify_qstate.c b/net/vmw_vsock/vmci_transport_notify_qstate.c
index 12f0cb8fe998..e96a88d850a8 100644
--- a/net/vmw_vsock/vmci_transport_notify_qstate.c
+++ b/net/vmw_vsock/vmci_transport_notify_qstate.c
@@ -84,7 +84,7 @@ vmci_transport_handle_wrote(struct sock *sk,
bool bottom_half,
struct sockaddr_vm *dst, struct sockaddr_vm *src)
{
- sk->sk_data_ready(sk);
+ vsock_data_ready(sk);
}
static void vsock_block_update_write_window(struct sock *sk)
@@ -282,7 +282,7 @@ vmci_transport_notify_pkt_recv_post_dequeue(
/* See the comment in
* vmci_transport_notify_pkt_send_post_enqueue().
*/
- sk->sk_data_ready(sk);
+ vsock_data_ready(sk);
}
return err;
--
2.25.1
This adds extra condition to wake up data reader: do it only when number
of readable bytes >= SO_RCVLOWAT. Otherwise, there is no sense to kick
user,because it will wait until SO_RCVLOWAT bytes will be dequeued. This
check is performed in vsock_data_ready().
Signed-off-by: Arseniy Krasnov <[email protected]>
Reviewed-by: Stefano Garzarella <[email protected]>
---
net/vmw_vsock/virtio_transport_common.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index 8f6356ebcdd1..35863132f4f1 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -1081,7 +1081,7 @@ virtio_transport_recv_connected(struct sock *sk,
switch (le16_to_cpu(pkt->hdr.op)) {
case VIRTIO_VSOCK_OP_RW:
virtio_transport_recv_enqueue(vsk, pkt);
- sk->sk_data_ready(sk);
+ vsock_data_ready(sk);
return err;
case VIRTIO_VSOCK_OP_CREDIT_REQUEST:
virtio_transport_send_credit_update(vsk);
--
2.25.1
This adds test to check, that when poll() returns POLLIN, POLLRDNORM bits,
next read call won't block.
Signed-off-by: Arseniy Krasnov <[email protected]>
Reviewed-by: Stefano Garzarella <[email protected]>
---
tools/testing/vsock/vsock_test.c | 108 +++++++++++++++++++++++++++++++
1 file changed, 108 insertions(+)
diff --git a/tools/testing/vsock/vsock_test.c b/tools/testing/vsock/vsock_test.c
index dc577461afc2..bb6d691cb30d 100644
--- a/tools/testing/vsock/vsock_test.c
+++ b/tools/testing/vsock/vsock_test.c
@@ -18,6 +18,7 @@
#include <sys/socket.h>
#include <time.h>
#include <sys/mman.h>
+#include <poll.h>
#include "timeout.h"
#include "control.h"
@@ -596,6 +597,108 @@ static void test_seqpacket_invalid_rec_buffer_server(const struct test_opts *opt
close(fd);
}
+#define RCVLOWAT_BUF_SIZE 128
+
+static void test_stream_poll_rcvlowat_server(const struct test_opts *opts)
+{
+ int fd;
+ int i;
+
+ fd = vsock_stream_accept(VMADDR_CID_ANY, 1234, NULL);
+ if (fd < 0) {
+ perror("accept");
+ exit(EXIT_FAILURE);
+ }
+
+ /* Send 1 byte. */
+ send_byte(fd, 1, 0);
+
+ control_writeln("SRVSENT");
+
+ /* Wait until client is ready to receive rest of data. */
+ control_expectln("CLNSENT");
+
+ for (i = 0; i < RCVLOWAT_BUF_SIZE - 1; i++)
+ send_byte(fd, 1, 0);
+
+ /* Keep socket in active state. */
+ control_expectln("POLLDONE");
+
+ close(fd);
+}
+
+static void test_stream_poll_rcvlowat_client(const struct test_opts *opts)
+{
+ unsigned long lowat_val = RCVLOWAT_BUF_SIZE;
+ char buf[RCVLOWAT_BUF_SIZE];
+ struct pollfd fds;
+ ssize_t read_res;
+ short poll_flags;
+ int fd;
+
+ fd = vsock_stream_connect(opts->peer_cid, 1234);
+ if (fd < 0) {
+ perror("connect");
+ exit(EXIT_FAILURE);
+ }
+
+ if (setsockopt(fd, SOL_SOCKET, SO_RCVLOWAT,
+ &lowat_val, sizeof(lowat_val))) {
+ perror("setsockopt");
+ exit(EXIT_FAILURE);
+ }
+
+ control_expectln("SRVSENT");
+
+ /* At this point, server sent 1 byte. */
+ fds.fd = fd;
+ poll_flags = POLLIN | POLLRDNORM;
+ fds.events = poll_flags;
+
+ /* Try to wait for 1 sec. */
+ if (poll(&fds, 1, 1000) < 0) {
+ perror("poll");
+ exit(EXIT_FAILURE);
+ }
+
+ /* poll() must return nothing. */
+ if (fds.revents) {
+ fprintf(stderr, "Unexpected poll result %hx\n",
+ fds.revents);
+ exit(EXIT_FAILURE);
+ }
+
+ /* Tell server to send rest of data. */
+ control_writeln("CLNSENT");
+
+ /* Poll for data. */
+ if (poll(&fds, 1, 10000) < 0) {
+ perror("poll");
+ exit(EXIT_FAILURE);
+ }
+
+ /* Only these two bits are expected. */
+ if (fds.revents != poll_flags) {
+ fprintf(stderr, "Unexpected poll result %hx\n",
+ fds.revents);
+ exit(EXIT_FAILURE);
+ }
+
+ /* Use MSG_DONTWAIT, if call is going to wait, EAGAIN
+ * will be returned.
+ */
+ read_res = recv(fd, buf, sizeof(buf), MSG_DONTWAIT);
+ if (read_res != RCVLOWAT_BUF_SIZE) {
+ fprintf(stderr, "Unexpected recv result %zi\n",
+ read_res);
+ exit(EXIT_FAILURE);
+ }
+
+ control_writeln("POLLDONE");
+
+ close(fd);
+}
+
static struct test_case test_cases[] = {
{
.name = "SOCK_STREAM connection reset",
@@ -646,6 +749,11 @@ static struct test_case test_cases[] = {
.run_client = test_seqpacket_invalid_rec_buffer_client,
.run_server = test_seqpacket_invalid_rec_buffer_server,
},
+ {
+ .name = "SOCK_STREAM poll() + SO_RCVLOWAT",
+ .run_client = test_stream_poll_rcvlowat_client,
+ .run_server = test_stream_poll_rcvlowat_server,
+ },
{},
};
--
2.25.1
Hello:
This series was applied to netdev/net-next.git (master)
by Paolo Abeni <[email protected]>:
On Fri, 19 Aug 2022 05:21:58 +0000 you wrote:
> Hello,
>
> This patchset includes some updates for SO_RCVLOWAT:
>
> 1) af_vsock:
> During my experiments with zerocopy receive, i found, that in some
> cases, poll() implementation violates POSIX: when socket has non-
> default SO_RCVLOWAT(e.g. not 1), poll() will always set POLLIN and
> POLLRDNORM bits in 'revents' even number of bytes available to read
> on socket is smaller than SO_RCVLOWAT value. In this case,user sees
> POLLIN flag and then tries to read data(for example using 'read()'
> call), but read call will be blocked, because SO_RCVLOWAT logic is
> supported in dequeue loop in af_vsock.c. But the same time, POSIX
> requires that:
>
> [...]
Here is the summary with links:
- [net-next,v4,1/9] vsock: SO_RCVLOWAT transport set callback
https://git.kernel.org/netdev/net-next/c/e38f22c860ed
- [net-next,v4,2/9] hv_sock: disable SO_RCVLOWAT support
https://git.kernel.org/netdev/net-next/c/24764f8d3c31
- [net-next,v4,3/9] virtio/vsock: use 'target' in notify_poll_in callback
https://git.kernel.org/netdev/net-next/c/e7a3266c9167
- [net-next,v4,4/9] vmci/vsock: use 'target' in notify_poll_in callback
https://git.kernel.org/netdev/net-next/c/a274f6ff3c5c
- [net-next,v4,5/9] vsock: pass sock_rcvlowat to notify_poll_in as target
https://git.kernel.org/netdev/net-next/c/ee0b3843a269
- [net-next,v4,6/9] vsock: add API call for data ready
https://git.kernel.org/netdev/net-next/c/f2fdcf67aceb
- [net-next,v4,7/9] virtio/vsock: check SO_RCVLOWAT before wake up reader
https://git.kernel.org/netdev/net-next/c/39f1ed33a448
- [net-next,v4,8/9] vmci/vsock: check SO_RCVLOWAT before wake up reader
https://git.kernel.org/netdev/net-next/c/e061aed99855
- [net-next,v4,9/9] vsock_test: POLLIN + SO_RCVLOWAT test
https://git.kernel.org/netdev/net-next/c/b1346338fbae
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
On Tue, 23 Aug 2022 15:14:10 -0400 Stefan Hajnoczi wrote:
> Stefano will be online again on Monday. I suggest we wait for him to
> review this series. If it's urgent, please let me know and I'll take a
> look.
It was already applied, sorry about that. But please continue with
review as if it wasn't. We'll just revert based on Stefano's feedback
as needed.
On Tue, Aug 23, 2022 at 12:18:52PM -0700, Jakub Kicinski wrote:
> On Tue, 23 Aug 2022 15:14:10 -0400 Stefan Hajnoczi wrote:
> > Stefano will be online again on Monday. I suggest we wait for him to
> > review this series. If it's urgent, please let me know and I'll take a
> > look.
>
> It was already applied, sorry about that. But please continue with
> review as if it wasn't. We'll just revert based on Stefano's feedback
> as needed.
Okay, no problem.
Stefan
On Fri, Aug 19, 2022 at 05:21:58AM +0000, Arseniy Krasnov wrote:
> Hello,
>
> This patchset includes some updates for SO_RCVLOWAT:
>
> 1) af_vsock:
> During my experiments with zerocopy receive, i found, that in some
> cases, poll() implementation violates POSIX: when socket has non-
> default SO_RCVLOWAT(e.g. not 1), poll() will always set POLLIN and
> POLLRDNORM bits in 'revents' even number of bytes available to read
> on socket is smaller than SO_RCVLOWAT value. In this case,user sees
> POLLIN flag and then tries to read data(for example using 'read()'
> call), but read call will be blocked, because SO_RCVLOWAT logic is
> supported in dequeue loop in af_vsock.c. But the same time, POSIX
> requires that:
>
> "POLLIN Data other than high-priority data may be read without
> blocking.
> POLLRDNORM Normal data may be read without blocking."
>
> See https://www.open-std.org/jtc1/sc22/open/n4217.pdf, page 293.
>
> So, we have, that poll() syscall returns POLLIN, but read call will
> be blocked.
>
> Also in man page socket(7) i found that:
>
> "Since Linux 2.6.28, select(2), poll(2), and epoll(7) indicate a
> socket as readable only if at least SO_RCVLOWAT bytes are available."
>
> I checked TCP callback for poll()(net/ipv4/tcp.c, tcp_poll()), it
> uses SO_RCVLOWAT value to set POLLIN bit, also i've tested TCP with
> this case for TCP socket, it works as POSIX required.
>
> I've added some fixes to af_vsock.c and virtio_transport_common.c,
> test is also implemented.
>
> 2) virtio/vsock:
> It adds some optimization to wake ups, when new data arrived. Now,
> SO_RCVLOWAT is considered before wake up sleepers who wait new data.
> There is no sense, to kick waiter, when number of available bytes
> in socket's queue < SO_RCVLOWAT, because if we wake up reader in
> this case, it will wait for SO_RCVLOWAT data anyway during dequeue,
> or in poll() case, POLLIN/POLLRDNORM bits won't be set, so such
> exit from poll() will be "spurious". This logic is also used in TCP
> sockets.
>
> 3) vmci/vsock:
> Same as 2), but i'm not sure about this changes. Will be very good,
> to get comments from someone who knows this code.
>
> 4) Hyper-V:
> As Dexuan Cui mentioned, for Hyper-V transport it is difficult to
> support SO_RCVLOWAT, so he suggested to disable this feature for
> Hyper-V.
>
> Thank You
Hi Arseniy,
Stefano will be online again on Monday. I suggest we wait for him to
review this series. If it's urgent, please let me know and I'll take a
look.
Thanks,
Stefan
> Arseniy Krasnov(9):
> vsock: SO_RCVLOWAT transport set callback
> hv_sock: disable SO_RCVLOWAT support
> virtio/vsock: use 'target' in notify_poll_in callback
> vmci/vsock: use 'target' in notify_poll_in callback
> vsock: pass sock_rcvlowat to notify_poll_in as target
> vsock: add API call for data ready
> virtio/vsock: check SO_RCVLOWAT before wake up reader
> vmci/vsock: check SO_RCVLOWAT before wake up reader
> vsock_test: POLLIN + SO_RCVLOWAT test
>
> include/net/af_vsock.h | 2 +
> net/vmw_vsock/af_vsock.c | 33 +++++++-
> net/vmw_vsock/hyperv_transport.c | 7 ++
> net/vmw_vsock/virtio_transport_common.c | 7 +-
> net/vmw_vsock/vmci_transport_notify.c | 10 +--
> net/vmw_vsock/vmci_transport_notify_qstate.c | 12 +--
> tools/testing/vsock/vsock_test.c | 108 +++++++++++++++++++++++++++
> 7 files changed, 162 insertions(+), 17 deletions(-)
>
> Changelog:
>
> v1 -> v2:
> 1) Patches for VMCI transport(same as for virtio-vsock).
> 2) Patches for Hyper-V transport(disabling SO_RCVLOWAT setting).
> 3) Waiting logic in test was updated(sleep() -> poll()).
>
> v2 -> v3:
> 1) Patches were reordered.
> 2) Commit message updated in 0005.
> 3) Check 'transport' pointer in 0001 for NULL.
>
> v3 -> v4:
> 1) vsock_set_rcvlowat() logic changed. Previous version required
> assigned transport and always called its 'set_rcvlowat' callback
> (if present). Now, assignment is not needed.
> 2) 0003,0004,0005,0006,0007,0008 - commit messages updated.
> 3) 0009 - small refactoring and style fixes.
> 4) RFC tag was removed.
>
> --
> 2.25.1
On Tue, 2022-08-23 at 16:30 -0400, Stefan Hajnoczi wrote:
> On Tue, Aug 23, 2022 at 12:18:52PM -0700, Jakub Kicinski wrote:
> > On Tue, 23 Aug 2022 15:14:10 -0400 Stefan Hajnoczi wrote:
> > > Stefano will be online again on Monday. I suggest we wait for him to
> > > review this series. If it's urgent, please let me know and I'll take a
> > > look.
> >
> > It was already applied, sorry about that. But please continue with
> > review as if it wasn't. We'll just revert based on Stefano's feedback
> > as needed.
>
> Okay, no problem.
For the records, I applied the series since it looked to me Arseniy
addressed all the feedback from Stefano on the first patch (the only
one still lacking an acked-by/reviewed-by tag) and I thought it would
be better avoiding such delay.
We are still early in this net-next cycle, I think it can be improved
incrementally as needed.
Paolo
On 23.08.2022 23:57, Paolo Abeni wrote:
> On Tue, 2022-08-23 at 16:30 -0400, Stefan Hajnoczi wrote:
>> On Tue, Aug 23, 2022 at 12:18:52PM -0700, Jakub Kicinski wrote:
>>> On Tue, 23 Aug 2022 15:14:10 -0400 Stefan Hajnoczi wrote:
>>>> Stefano will be online again on Monday. I suggest we wait for him to
>>>> review this series. If it's urgent, please let me know and I'll take a
>>>> look.
Hi,
sure, nothing urgent, no problem. Let's wait for Stefano! Is there something
wrong with this patchset? I've replaced RFC -> net-next since Stefano reviewed
all patches except 0001 and suggested to use net-next in v4.
>>>
>>> It was already applied, sorry about that. But please continue with
>>> review as if it wasn't. We'll just revert based on Stefano's feedback
>>> as needed.
>>
>> Okay, no problem.
>
> For the records, I applied the series since it looked to me Arseniy
> addressed all the feedback from Stefano on the first patch (the only
> one still lacking an acked-by/reviewed-by tag) and I thought it would
> be better avoiding such delay.
Yes, only 0001 lacks reviewed-by
>
> We are still early in this net-next cycle, I think it can be improved
> incrementally as needed.
>
> Paolo
>
Thank You, Arseniy
On Tue, Aug 23, 2022 at 10:57:01PM +0200, Paolo Abeni wrote:
>On Tue, 2022-08-23 at 16:30 -0400, Stefan Hajnoczi wrote:
>> On Tue, Aug 23, 2022 at 12:18:52PM -0700, Jakub Kicinski wrote:
>> > On Tue, 23 Aug 2022 15:14:10 -0400 Stefan Hajnoczi wrote:
>> > > Stefano will be online again on Monday. I suggest we wait for him to
>> > > review this series. If it's urgent, please let me know and I'll take a
>> > > look.
>> >
>> > It was already applied, sorry about that. But please continue with
>> > review as if it wasn't. We'll just revert based on Stefano's feedback
>> > as needed.
Just back, and I'm fine with this version, so thanks for merging!
I also run tests with virtio-vsock and everything is fine.
>>
>> Okay, no problem.
>
>For the records, I applied the series since it looked to me Arseniy
>addressed all the feedback from Stefano on the first patch (the only
>one still lacking an acked-by/reviewed-by tag) and I thought it would
>be better avoiding such delay.
Yep, from v3 I asked some changes on the first patch that Arseniy
addressed in this version, and we were waiting an ack for VMCI changes
(thanks Vishnu for giving it).
So, it should be fine.
Thanks,
Stefano