2022-07-25 08:00:00

by Arseniy Krasnov

[permalink] [raw]
Subject: [RFC PATCH v2 0/9] vsock: updates for SO_RCVLOWAT handling

Hello,

This patchset includes some updates for SO_RCVLOWAT:

1) af_vsock:
During my experiments with zerocopy receive, i found, that in some
cases, poll() implementation violates POSIX: when socket has non-
default SO_RCVLOWAT(e.g. not 1), poll() will always set POLLIN and
POLLRDNORM bits in 'revents' even number of bytes available to read
on socket is smaller than SO_RCVLOWAT value. In this case,user sees
POLLIN flag and then tries to read data(for example using 'read()'
call), but read call will be blocked, because SO_RCVLOWAT logic is
supported in dequeue loop in af_vsock.c. But the same time, POSIX
requires that:

"POLLIN Data other than high-priority data may be read without
blocking.
POLLRDNORM Normal data may be read without blocking."

See https://www.open-std.org/jtc1/sc22/open/n4217.pdf, page 293.

So, we have, that poll() syscall returns POLLIN, but read call will
be blocked.

Also in man page socket(7) i found that:

"Since Linux 2.6.28, select(2), poll(2), and epoll(7) indicate a
socket as readable only if at least SO_RCVLOWAT bytes are available."

I checked TCP callback for poll()(net/ipv4/tcp.c, tcp_poll()), it
uses SO_RCVLOWAT value to set POLLIN bit, also i've tested TCP with
this case for TCP socket, it works as POSIX required.

I've added some fixes to af_vsock.c and virtio_transport_common.c,
test is also implemented.

2) virtio/vsock:
It adds some optimization to wake ups, when new data arrived. Now,
SO_RCVLOWAT is considered before wake up sleepers who wait new data.
There is no sense, to kick waiter, when number of available bytes
in socket's queue < SO_RCVLOWAT, because if we wake up reader in
this case, it will wait for SO_RCVLOWAT data anyway during dequeue,
or in poll() case, POLLIN/POLLRDNORM bits won't be set, so such
exit from poll() will be "spurious". This logic is also used in TCP
sockets.

3) vmci/vsock:
Same as 2), but i'm not sure about this changes. Will be very good,
to get comments from someone who knows this code.

4) Hyper-V:
As Dexuan Cui mentioned, for Hyper-V transport it is difficult to
support SO_RCVLOWAT, so he suggested to disable this feature for
Hyper-V.

Thank You

Arseniy Krasnov(9):
vsock: use sk_rcvlowat to set POLLIN/POLLRDNORM
virtio/vsock: use 'target' in notify_poll_in callback
vmci/vsock: use 'target' in notify_poll_in callback
vsock_test: POLLIN + SO_RCVLOWAT test
vsock: SO_RCVLOWAT transport set callback
hv_sock: disable SO_RCVLOWAT support
vsock: add API call for data ready
virtio/vsock: check SO_RCVLOWAT before wake up reader
vmci/vsock: check SO_RCVLOWAT before wake up reader

include/net/af_vsock.h | 2 +
net/vmw_vsock/af_vsock.c | 32 +++++++-
net/vmw_vsock/hyperv_transport.c | 7 ++
net/vmw_vsock/virtio_transport_common.c | 7 +-
net/vmw_vsock/vmci_transport_notify.c | 4 +-
net/vmw_vsock/vmci_transport_notify_qstate.c | 6 +-
tools/testing/vsock/vsock_test.c | 107 +++++++++++++++++++++++++++
7 files changed, 154 insertions(+), 11 deletions(-)

--
2.25.1


2022-07-25 08:01:16

by Arseniy Krasnov

[permalink] [raw]
Subject: [RFC PATCH v2 2/9] virtio/vsock: use 'target' in notify_poll_in, callback

This callback controls setting of POLLIN,POLLRDNORM output bits of poll()
syscall,but in some cases,it is incorrectly to set it, when socket has
at least 1 bytes of available data. Use 'target' which is already exists
and equal to sk_rcvlowat in this case.

Signed-off-by: Arseniy Krasnov <[email protected]>
---
net/vmw_vsock/virtio_transport_common.c | 5 +----
1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index ec2c2afbf0d0..8f6356ebcdd1 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -634,10 +634,7 @@ virtio_transport_notify_poll_in(struct vsock_sock *vsk,
size_t target,
bool *data_ready_now)
{
- if (vsock_stream_has_data(vsk))
- *data_ready_now = true;
- else
- *data_ready_now = false;
+ *data_ready_now = vsock_stream_has_data(vsk) >= target;

return 0;
}
--
2.25.1

2022-07-25 08:15:39

by Arseniy Krasnov

[permalink] [raw]
Subject: [RFC PATCH v2 4/9] vsock_test: POLLIN + SO_RCVLOWAT test

This adds test to check,that when poll() returns POLLIN,POLLRDNORM bits,
next read call won't block.

Signed-off-by: Arseniy Krasnov <[email protected]>
---
tools/testing/vsock/vsock_test.c | 107 +++++++++++++++++++++++++++++++
1 file changed, 107 insertions(+)

diff --git a/tools/testing/vsock/vsock_test.c b/tools/testing/vsock/vsock_test.c
index dc577461afc2..920dc5d5d979 100644
--- a/tools/testing/vsock/vsock_test.c
+++ b/tools/testing/vsock/vsock_test.c
@@ -18,6 +18,7 @@
#include <sys/socket.h>
#include <time.h>
#include <sys/mman.h>
+#include <poll.h>

#include "timeout.h"
#include "control.h"
@@ -596,6 +597,107 @@ static void test_seqpacket_invalid_rec_buffer_server(const struct test_opts *opt
close(fd);
}

+static void test_stream_poll_rcvlowat_server(const struct test_opts *opts)
+{
+#define RCVLOWAT_BUF_SIZE 128
+ int fd;
+ int i;
+
+ fd = vsock_stream_accept(VMADDR_CID_ANY, 1234, NULL);
+ if (fd < 0) {
+ perror("accept");
+ exit(EXIT_FAILURE);
+ }
+
+ /* Send 1 byte. */
+ send_byte(fd, 1, 0);
+
+ control_writeln("SRVSENT");
+
+ /* Wait until client is ready to receive rest of data. */
+ control_expectln("CLNSENT");
+
+ for (i = 0; i < RCVLOWAT_BUF_SIZE - 1; i++)
+ send_byte(fd, 1, 0);
+
+ /* Keep socket in active state. */
+ control_expectln("POLLDONE");
+
+ close(fd);
+}
+
+static void test_stream_poll_rcvlowat_client(const struct test_opts *opts)
+{
+ unsigned long lowat_val = RCVLOWAT_BUF_SIZE;
+ char buf[RCVLOWAT_BUF_SIZE];
+ struct pollfd fds;
+ ssize_t read_res;
+ short poll_flags;
+ int fd;
+
+ fd = vsock_stream_connect(opts->peer_cid, 1234);
+ if (fd < 0) {
+ perror("connect");
+ exit(EXIT_FAILURE);
+ }
+
+ if (setsockopt(fd, SOL_SOCKET, SO_RCVLOWAT,
+ &lowat_val, sizeof(lowat_val))) {
+ perror("setsockopt");
+ exit(EXIT_FAILURE);
+ }
+
+ control_expectln("SRVSENT");
+
+ /* At this point, server sent 1 byte. */
+ fds.fd = fd;
+ poll_flags = POLLIN | POLLRDNORM;
+ fds.events = poll_flags;
+
+ /* Try to wait for 1 sec. */
+ if (poll(&fds, 1, 1000) < 0) {
+ perror("poll");
+ exit(EXIT_FAILURE);
+ }
+
+ /* poll() must return nothing. */
+ if (fds.revents) {
+ fprintf(stderr, "Unexpected poll result %hx\n",
+ fds.revents);
+ exit(EXIT_FAILURE);
+ }
+
+ /* Tell server to send rest of data. */
+ control_writeln("CLNSENT");
+
+ /* Poll for data. */
+ if (poll(&fds, 1, 10000) < 0) {
+ perror("poll");
+ exit(EXIT_FAILURE);
+ }
+
+ /* Only these two bits are expected. */
+ if (fds.revents != poll_flags) {
+ fprintf(stderr, "Unexpected poll result %hx\n",
+ fds.revents);
+ exit(EXIT_FAILURE);
+ }
+
+ /* Use MSG_DONTWAIT, if call is going to wait, EAGAIN
+ * will be returned.
+ */
+ read_res = recv(fd, buf, sizeof(buf), MSG_DONTWAIT);
+ if (read_res != RCVLOWAT_BUF_SIZE) {
+ fprintf(stderr, "Unexpected recv result %zi\n",
+ read_res);
+ exit(EXIT_FAILURE);
+ }
+
+ control_writeln("POLLDONE");
+
+ close(fd);
+}
+
static struct test_case test_cases[] = {
{
.name = "SOCK_STREAM connection reset",
@@ -646,6 +748,11 @@ static struct test_case test_cases[] = {
.run_client = test_seqpacket_invalid_rec_buffer_client,
.run_server = test_seqpacket_invalid_rec_buffer_server,
},
+ {
+ .name = "SOCK_STREAM poll() + SO_RCVLOWAT",
+ .run_client = test_stream_poll_rcvlowat_client,
+ .run_server = test_stream_poll_rcvlowat_server,
+ },
{},
};

--
2.25.1

2022-07-25 08:19:40

by Arseniy Krasnov

[permalink] [raw]
Subject: [RFC PATCH v2 5/9] vsock: SO_RCVLOWAT transport set callback

This adds transport specific callback for SO_RCVLOWAT, because in some
transports it may be difficult to know current available number of bytes
ready to read. Thus, when SO_RCVLOWAT is set, transport may reject it.

Signed-off-by: Arseniy Krasnov <[email protected]>
---
include/net/af_vsock.h | 1 +
net/vmw_vsock/af_vsock.c | 19 +++++++++++++++++++
2 files changed, 20 insertions(+)

diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
index f742e50207fb..eae5874bae35 100644
--- a/include/net/af_vsock.h
+++ b/include/net/af_vsock.h
@@ -134,6 +134,7 @@ struct vsock_transport {
u64 (*stream_rcvhiwat)(struct vsock_sock *);
bool (*stream_is_active)(struct vsock_sock *);
bool (*stream_allow)(u32 cid, u32 port);
+ int (*set_rcvlowat)(struct vsock_sock *, int);

/* SEQ_PACKET. */
ssize_t (*seqpacket_dequeue)(struct vsock_sock *vsk, struct msghdr *msg,
diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index 63a13fa2686a..b7a286db4af1 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -2130,6 +2130,24 @@ vsock_connectible_recvmsg(struct socket *sock, struct msghdr *msg, size_t len,
return err;
}

+static int vsock_set_rcvlowat(struct sock *sk, int val)
+{
+ const struct vsock_transport *transport;
+ struct vsock_sock *vsk;
+ int err = 0;
+
+ vsk = vsock_sk(sk);
+ transport = vsk->transport;
+
+ if (transport->set_rcvlowat)
+ err = transport->set_rcvlowat(vsk, val);
+
+ if (!err)
+ WRITE_ONCE(sk->sk_rcvlowat, val ? : 1);
+
+ return err;
+}
+
static const struct proto_ops vsock_stream_ops = {
.family = PF_VSOCK,
.owner = THIS_MODULE,
@@ -2149,6 +2167,7 @@ static const struct proto_ops vsock_stream_ops = {
.recvmsg = vsock_connectible_recvmsg,
.mmap = sock_no_mmap,
.sendpage = sock_no_sendpage,
+ .set_rcvlowat = vsock_set_rcvlowat,
};

static const struct proto_ops vsock_seqpacket_ops = {
--
2.25.1

2022-07-25 08:20:20

by Arseniy Krasnov

[permalink] [raw]
Subject: [RFC PATCH v2 7/9] vsock: add API call for data ready

This adds 'vsock_data_ready()' which must be called by transport to kick
sleeping data readers. It checks for SO_RCVLOWAT value before waking
user,thus preventing spurious wake ups.Based on 'tcp_data_ready()' logic.

Signed-off-by: Arseniy Krasnov <[email protected]>
---
include/net/af_vsock.h | 1 +
net/vmw_vsock/af_vsock.c | 10 ++++++++++
2 files changed, 11 insertions(+)

diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
index eae5874bae35..7b79fc5164cc 100644
--- a/include/net/af_vsock.h
+++ b/include/net/af_vsock.h
@@ -77,6 +77,7 @@ struct vsock_sock {
s64 vsock_stream_has_data(struct vsock_sock *vsk);
s64 vsock_stream_has_space(struct vsock_sock *vsk);
struct sock *vsock_create_connected(struct sock *parent);
+void vsock_data_ready(struct sock *sk);

/**** TRANSPORT ****/

diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index b7a286db4af1..4b3ec3f9383f 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -882,6 +882,16 @@ s64 vsock_stream_has_space(struct vsock_sock *vsk)
}
EXPORT_SYMBOL_GPL(vsock_stream_has_space);

+void vsock_data_ready(struct sock *sk)
+{
+ struct vsock_sock *vsk = vsock_sk(sk);
+
+ if (vsock_stream_has_data(vsk) >= sk->sk_rcvlowat ||
+ sock_flag(sk, SOCK_DONE))
+ sk->sk_data_ready(sk);
+}
+EXPORT_SYMBOL_GPL(vsock_data_ready);
+
static int vsock_release(struct socket *sock)
{
__vsock_release(sock->sk, 0);
--
2.25.1

2022-07-25 08:40:31

by Arseniy Krasnov

[permalink] [raw]
Subject: [RFC PATCH v2 1/9] vsock: use sk_rcvlowat to set POLLIN/POLLRDNORM

Both bits indicate, that next data read call won't be blocked, but when
sk_rcvlowat is not 1, these bits will be set by poll anyway, thus when
user tries to dequeue data,it will wait until sk_rcvlowat bytes of data
will be available.

Signed-off-by: Arseniy Krasnov <[email protected]>
---
net/vmw_vsock/af_vsock.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index f04abf662ec6..63a13fa2686a 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -1066,8 +1066,9 @@ static __poll_t vsock_poll(struct file *file, struct socket *sock,
if (transport && transport->stream_is_active(vsk) &&
!(sk->sk_shutdown & RCV_SHUTDOWN)) {
bool data_ready_now = false;
+ int target = sock_rcvlowat(sk, 0, INT_MAX);
int ret = transport->notify_poll_in(
- vsk, 1, &data_ready_now);
+ vsk, target, &data_ready_now);
if (ret < 0) {
mask |= EPOLLERR;
} else {
--
2.25.1

2022-07-25 08:41:45

by Arseniy Krasnov

[permalink] [raw]
Subject: [RFC PATCH v2 3/9] vmci/vsock: use 'target' in notify_poll_in, callback

This callback controls setting of POLLIN,POLLRDNORM output bits of poll()
syscall,but in some cases,it is incorrectly to set it, when socket has
at least 1 bytes of available data. Use 'target' which is already exists
and equal to sk_rcvlowat in this case.

Signed-off-by: Arseniy Krasnov <[email protected]>
---
net/vmw_vsock/vmci_transport_notify.c | 2 +-
net/vmw_vsock/vmci_transport_notify_qstate.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/vmw_vsock/vmci_transport_notify.c b/net/vmw_vsock/vmci_transport_notify.c
index d69fc4b595ad..1684b85b0660 100644
--- a/net/vmw_vsock/vmci_transport_notify.c
+++ b/net/vmw_vsock/vmci_transport_notify.c
@@ -340,7 +340,7 @@ vmci_transport_notify_pkt_poll_in(struct sock *sk,
{
struct vsock_sock *vsk = vsock_sk(sk);

- if (vsock_stream_has_data(vsk)) {
+ if (vsock_stream_has_data(vsk) >= target) {
*data_ready_now = true;
} else {
/* We can't read right now because there is nothing in the
diff --git a/net/vmw_vsock/vmci_transport_notify_qstate.c b/net/vmw_vsock/vmci_transport_notify_qstate.c
index 0f36d7c45db3..a40407872b53 100644
--- a/net/vmw_vsock/vmci_transport_notify_qstate.c
+++ b/net/vmw_vsock/vmci_transport_notify_qstate.c
@@ -161,7 +161,7 @@ vmci_transport_notify_pkt_poll_in(struct sock *sk,
{
struct vsock_sock *vsk = vsock_sk(sk);

- if (vsock_stream_has_data(vsk)) {
+ if (vsock_stream_has_data(vsk) >= target) {
*data_ready_now = true;
} else {
/* We can't read right now because there is nothing in the
--
2.25.1

2022-07-25 08:41:57

by Arseniy Krasnov

[permalink] [raw]
Subject: [RFC PATCH v2 6/9] hv_sock: disable SO_RCVLOWAT support

For Hyper-V it is quiet difficult to support this socket option,due to
transport internals, so disable it.

Signed-off-by: Arseniy Krasnov <[email protected]>
---
net/vmw_vsock/hyperv_transport.c | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/net/vmw_vsock/hyperv_transport.c b/net/vmw_vsock/hyperv_transport.c
index e111e13b6660..5fab8f356a86 100644
--- a/net/vmw_vsock/hyperv_transport.c
+++ b/net/vmw_vsock/hyperv_transport.c
@@ -802,6 +802,12 @@ int hvs_notify_send_post_enqueue(struct vsock_sock *vsk, ssize_t written,
return 0;
}

+static
+int hvs_set_rcvlowat(struct vsock_sock *vsk, int val)
+{
+ return -EOPNOTSUPP;
+}
+
static struct vsock_transport hvs_transport = {
.module = THIS_MODULE,

@@ -837,6 +843,7 @@ static struct vsock_transport hvs_transport = {
.notify_send_pre_enqueue = hvs_notify_send_pre_enqueue,
.notify_send_post_enqueue = hvs_notify_send_post_enqueue,

+ .set_rcvlowat = hvs_set_rcvlowat
};

static bool hvs_check_transport(struct vsock_sock *vsk)
--
2.25.1

2022-07-25 09:01:10

by Arseniy Krasnov

[permalink] [raw]
Subject: [RFC PATCH v2 8/9] virtio/vsock: check SO_RCVLOWAT before wake up reader

This adds extra condition to wake up data reader: do it only when number
of readable bytes >= SO_RCVLOWAT. Otherwise, there is no sense to kick
user,because it will wait until SO_RCVLOWAT bytes will be dequeued.

Signed-off-by: Arseniy Krasnov <[email protected]>
---
net/vmw_vsock/virtio_transport_common.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index 8f6356ebcdd1..35863132f4f1 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -1081,7 +1081,7 @@ virtio_transport_recv_connected(struct sock *sk,
switch (le16_to_cpu(pkt->hdr.op)) {
case VIRTIO_VSOCK_OP_RW:
virtio_transport_recv_enqueue(vsk, pkt);
- sk->sk_data_ready(sk);
+ vsock_data_ready(sk);
return err;
case VIRTIO_VSOCK_OP_CREDIT_REQUEST:
virtio_transport_send_credit_update(vsk);
--
2.25.1

2022-07-25 09:14:25

by Arseniy Krasnov

[permalink] [raw]
Subject: [RFC PATCH v2 9/9] vmci/vsock: check SO_RCVLOWAT before wake up reader

This adds extra condition to wake up data reader: do it only when number
of readable bytes >= SO_RCVLOWAT. Otherwise, there is no sense to kick
user,because it will wait until SO_RCVLOWAT bytes will be dequeued.

Signed-off-by: Arseniy Krasnov <[email protected]>
---
net/vmw_vsock/vmci_transport_notify.c | 2 +-
net/vmw_vsock/vmci_transport_notify_qstate.c | 4 ++--
2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/vmw_vsock/vmci_transport_notify.c b/net/vmw_vsock/vmci_transport_notify.c
index 1684b85b0660..ab42d73ab3da 100644
--- a/net/vmw_vsock/vmci_transport_notify.c
+++ b/net/vmw_vsock/vmci_transport_notify.c
@@ -307,7 +307,7 @@ vmci_transport_handle_wrote(struct sock *sk,
struct vsock_sock *vsk = vsock_sk(sk);
PKT_FIELD(vsk, sent_waiting_read) = false;
#endif
- sk->sk_data_ready(sk);
+ vsock_data_ready(sk);
}

static void vmci_transport_notify_pkt_socket_init(struct sock *sk)
diff --git a/net/vmw_vsock/vmci_transport_notify_qstate.c b/net/vmw_vsock/vmci_transport_notify_qstate.c
index a40407872b53..41dca5fbea5e 100644
--- a/net/vmw_vsock/vmci_transport_notify_qstate.c
+++ b/net/vmw_vsock/vmci_transport_notify_qstate.c
@@ -84,7 +84,7 @@ vmci_transport_handle_wrote(struct sock *sk,
bool bottom_half,
struct sockaddr_vm *dst, struct sockaddr_vm *src)
{
- sk->sk_data_ready(sk);
+ vsock_data_ready(sk);
}

static void vsock_block_update_write_window(struct sock *sk)
@@ -282,7 +282,7 @@ vmci_transport_notify_pkt_recv_post_dequeue(
/* See the comment in
* vmci_transport_notify_pkt_send_post_enqueue().
*/
- sk->sk_data_ready(sk);
+ vsock_data_ready(sk);
}

return err;
--
2.25.1

2022-07-27 12:29:22

by Stefano Garzarella

[permalink] [raw]
Subject: Re: [RFC PATCH v2 1/9] vsock: use sk_rcvlowat to set POLLIN/POLLRDNORM

On Mon, Jul 25, 2022 at 07:56:59AM +0000, Arseniy Krasnov wrote:
>Both bits indicate, that next data read call won't be blocked, but when
>sk_rcvlowat is not 1, these bits will be set by poll anyway, thus when
>user tries to dequeue data,it will wait until sk_rcvlowat bytes of data
>will be available.
>

The patch LGTM, but I suggest you to rewrite the title and commit of the
message to better explain what this patch does (pass sock_rcvlowat to
notify_poll_in as target) and then explain why as you already did (to
set POLLIN/POLLRDNORM only when target is reached).

Thanks,
Stefano

>Signed-off-by: Arseniy Krasnov <[email protected]>
>---
> net/vmw_vsock/af_vsock.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
>diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
>index f04abf662ec6..63a13fa2686a 100644
>--- a/net/vmw_vsock/af_vsock.c
>+++ b/net/vmw_vsock/af_vsock.c
>@@ -1066,8 +1066,9 @@ static __poll_t vsock_poll(struct file *file, struct socket *sock,
> if (transport && transport->stream_is_active(vsk) &&
> !(sk->sk_shutdown & RCV_SHUTDOWN)) {
> bool data_ready_now = false;
>+ int target = sock_rcvlowat(sk, 0, INT_MAX);
> int ret = transport->notify_poll_in(
>- vsk, 1, &data_ready_now);
>+ vsk, target, &data_ready_now);
> if (ret < 0) {
> mask |= EPOLLERR;
> } else {
>--
>2.25.1

2022-07-27 12:29:39

by Stefano Garzarella

[permalink] [raw]
Subject: Re: [RFC PATCH v2 3/9] vmci/vsock: use 'target' in notify_poll_in, callback

@Jorgen can you take a look at this series, especially this patch?

Maybe we need to update the comments in the else branch, something like
s/there is nothing/there is not enough data

Thanks,
Stefano

On Mon, Jul 25, 2022 at 08:01:01AM +0000, Arseniy Krasnov wrote:
>This callback controls setting of POLLIN,POLLRDNORM output bits of poll()
>syscall,but in some cases,it is incorrectly to set it, when socket has
>at least 1 bytes of available data. Use 'target' which is already exists
>and equal to sk_rcvlowat in this case.
>
>Signed-off-by: Arseniy Krasnov <[email protected]>
>---
> net/vmw_vsock/vmci_transport_notify.c | 2 +-
> net/vmw_vsock/vmci_transport_notify_qstate.c | 2 +-
> 2 files changed, 2 insertions(+), 2 deletions(-)
>
>diff --git a/net/vmw_vsock/vmci_transport_notify.c b/net/vmw_vsock/vmci_transport_notify.c
>index d69fc4b595ad..1684b85b0660 100644
>--- a/net/vmw_vsock/vmci_transport_notify.c
>+++ b/net/vmw_vsock/vmci_transport_notify.c
>@@ -340,7 +340,7 @@ vmci_transport_notify_pkt_poll_in(struct sock *sk,
> {
> struct vsock_sock *vsk = vsock_sk(sk);
>
>- if (vsock_stream_has_data(vsk)) {
>+ if (vsock_stream_has_data(vsk) >= target) {
> *data_ready_now = true;
> } else {
> /* We can't read right now because there is nothing in the
>diff --git a/net/vmw_vsock/vmci_transport_notify_qstate.c b/net/vmw_vsock/vmci_transport_notify_qstate.c
>index 0f36d7c45db3..a40407872b53 100644
>--- a/net/vmw_vsock/vmci_transport_notify_qstate.c
>+++ b/net/vmw_vsock/vmci_transport_notify_qstate.c
>@@ -161,7 +161,7 @@ vmci_transport_notify_pkt_poll_in(struct sock *sk,
> {
> struct vsock_sock *vsk = vsock_sk(sk);
>
>- if (vsock_stream_has_data(vsk)) {
>+ if (vsock_stream_has_data(vsk) >= target) {
> *data_ready_now = true;
> } else {
> /* We can't read right now because there is nothing in the
>--
>2.25.1

2022-07-27 12:48:33

by Stefano Garzarella

[permalink] [raw]
Subject: Re: [RFC PATCH v2 5/9] vsock: SO_RCVLOWAT transport set callback

On Mon, Jul 25, 2022 at 08:05:28AM +0000, Arseniy Krasnov wrote:
>This adds transport specific callback for SO_RCVLOWAT, because in some
>transports it may be difficult to know current available number of bytes
>ready to read. Thus, when SO_RCVLOWAT is set, transport may reject it.
>
>Signed-off-by: Arseniy Krasnov <[email protected]>
>---
> include/net/af_vsock.h | 1 +
> net/vmw_vsock/af_vsock.c | 19 +++++++++++++++++++
> 2 files changed, 20 insertions(+)
>
>diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
>index f742e50207fb..eae5874bae35 100644
>--- a/include/net/af_vsock.h
>+++ b/include/net/af_vsock.h
>@@ -134,6 +134,7 @@ struct vsock_transport {
> u64 (*stream_rcvhiwat)(struct vsock_sock *);
> bool (*stream_is_active)(struct vsock_sock *);
> bool (*stream_allow)(u32 cid, u32 port);
>+ int (*set_rcvlowat)(struct vsock_sock *, int);
>
> /* SEQ_PACKET. */
> ssize_t (*seqpacket_dequeue)(struct vsock_sock *vsk, struct msghdr *msg,
>diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
>index 63a13fa2686a..b7a286db4af1 100644
>--- a/net/vmw_vsock/af_vsock.c
>+++ b/net/vmw_vsock/af_vsock.c
>@@ -2130,6 +2130,24 @@ vsock_connectible_recvmsg(struct socket *sock, struct msghdr *msg, size_t len,
> return err;
> }
>
>+static int vsock_set_rcvlowat(struct sock *sk, int val)
>+{
>+ const struct vsock_transport *transport;
>+ struct vsock_sock *vsk;
>+ int err = 0;
>+
>+ vsk = vsock_sk(sk);
>+ transport = vsk->transport;

`transport` can be NULL if the user call SO_RCVLOWAT before we assign
it, so we should check it.

I think if the transport implements `set_rcvlowat`, maybe we should set
there sk->sk_rcvlowat, so I would do something like that:

if (transport && transport->set_rcvlowat)
err = transport->set_rcvlowat(vsk, val);
else
WRITE_ONCE(sk->sk_rcvlowat, val ? : 1);

return err;

In addition I think we should check that val does not exceed
vsk->buffer_size, something similar of what tcp_set_rcvlowat() does.

Thanks,
Stefano

2022-07-27 13:20:06

by Stefano Garzarella

[permalink] [raw]
Subject: Re: [RFC PATCH v2 3/9] vmci/vsock: use 'target' in notify_poll_in, callback

On Wed, Jul 27, 2022 at 2:22 PM Stefano Garzarella <[email protected]> wrote:
>
> @Jorgen can you take a look at this series, especially this patch?

Jorgen's email bounced back, so I'm CCing VMCI maintainers.

Bryan, Rajesh, Vishnu, can you take a look?

Thanks,
Stefano

>
> Maybe we need to update the comments in the else branch, something like
> s/there is nothing/there is not enough data
>
> Thanks,
> Stefano
>
> On Mon, Jul 25, 2022 at 08:01:01AM +0000, Arseniy Krasnov wrote:
> >This callback controls setting of POLLIN,POLLRDNORM output bits of poll()
> >syscall,but in some cases,it is incorrectly to set it, when socket has
> >at least 1 bytes of available data. Use 'target' which is already exists
> >and equal to sk_rcvlowat in this case.
> >
> >Signed-off-by: Arseniy Krasnov <[email protected]>
> >---
> > net/vmw_vsock/vmci_transport_notify.c | 2 +-
> > net/vmw_vsock/vmci_transport_notify_qstate.c | 2 +-
> > 2 files changed, 2 insertions(+), 2 deletions(-)
> >
> >diff --git a/net/vmw_vsock/vmci_transport_notify.c b/net/vmw_vsock/vmci_transport_notify.c
> >index d69fc4b595ad..1684b85b0660 100644
> >--- a/net/vmw_vsock/vmci_transport_notify.c
> >+++ b/net/vmw_vsock/vmci_transport_notify.c
> >@@ -340,7 +340,7 @@ vmci_transport_notify_pkt_poll_in(struct sock *sk,
> > {
> > struct vsock_sock *vsk = vsock_sk(sk);
> >
> >- if (vsock_stream_has_data(vsk)) {
> >+ if (vsock_stream_has_data(vsk) >= target) {
> > *data_ready_now = true;
> > } else {
> > /* We can't read right now because there is nothing in the
> >diff --git a/net/vmw_vsock/vmci_transport_notify_qstate.c b/net/vmw_vsock/vmci_transport_notify_qstate.c
> >index 0f36d7c45db3..a40407872b53 100644
> >--- a/net/vmw_vsock/vmci_transport_notify_qstate.c
> >+++ b/net/vmw_vsock/vmci_transport_notify_qstate.c
> >@@ -161,7 +161,7 @@ vmci_transport_notify_pkt_poll_in(struct sock *sk,
> > {
> > struct vsock_sock *vsk = vsock_sk(sk);
> >
> >- if (vsock_stream_has_data(vsk)) {
> >+ if (vsock_stream_has_data(vsk) >= target) {
> > *data_ready_now = true;
> > } else {
> > /* We can't read right now because there is nothing in the
> >--
> >2.25.1

2022-07-27 13:44:56

by Stefano Garzarella

[permalink] [raw]
Subject: Re: [RFC PATCH v2 0/9] vsock: updates for SO_RCVLOWAT handling

Hi Arseniy,

On Mon, Jul 25, 2022 at 07:54:05AM +0000, Arseniy Krasnov wrote:
>Hello,
>
>This patchset includes some updates for SO_RCVLOWAT:
>
>1) af_vsock:
> During my experiments with zerocopy receive, i found, that in some
> cases, poll() implementation violates POSIX: when socket has non-
> default SO_RCVLOWAT(e.g. not 1), poll() will always set POLLIN and
> POLLRDNORM bits in 'revents' even number of bytes available to read
> on socket is smaller than SO_RCVLOWAT value. In this case,user sees
> POLLIN flag and then tries to read data(for example using 'read()'
> call), but read call will be blocked, because SO_RCVLOWAT logic is
> supported in dequeue loop in af_vsock.c. But the same time, POSIX
> requires that:
>
> "POLLIN Data other than high-priority data may be read without
> blocking.
> POLLRDNORM Normal data may be read without blocking."
>
> See https://www.open-std.org/jtc1/sc22/open/n4217.pdf, page 293.
>
> So, we have, that poll() syscall returns POLLIN, but read call will
> be blocked.
>
> Also in man page socket(7) i found that:
>
> "Since Linux 2.6.28, select(2), poll(2), and epoll(7) indicate a
> socket as readable only if at least SO_RCVLOWAT bytes are available."
>
> I checked TCP callback for poll()(net/ipv4/tcp.c, tcp_poll()), it
> uses SO_RCVLOWAT value to set POLLIN bit, also i've tested TCP with
> this case for TCP socket, it works as POSIX required.
>
> I've added some fixes to af_vsock.c and virtio_transport_common.c,
> test is also implemented.
>
>2) virtio/vsock:
> It adds some optimization to wake ups, when new data arrived. Now,
> SO_RCVLOWAT is considered before wake up sleepers who wait new data.
> There is no sense, to kick waiter, when number of available bytes
> in socket's queue < SO_RCVLOWAT, because if we wake up reader in
> this case, it will wait for SO_RCVLOWAT data anyway during dequeue,
> or in poll() case, POLLIN/POLLRDNORM bits won't be set, so such
> exit from poll() will be "spurious". This logic is also used in TCP
> sockets.

Nice, it looks good!

>
>3) vmci/vsock:
> Same as 2), but i'm not sure about this changes. Will be very good,
> to get comments from someone who knows this code.

I CCed VMCI maintainers to the patch and also to this cover, maybe
better to keep them in the loop for next versions.

(Jorgen's and Rajesh's emails bounced back, so I'm CCing here only
Bryan, Vishnu, and [email protected])

>
>4) Hyper-V:
> As Dexuan Cui mentioned, for Hyper-V transport it is difficult to
> support SO_RCVLOWAT, so he suggested to disable this feature for
> Hyper-V.

I left a couple of comments in some patches, but it seems to me to be in
a good state :-)

I would just suggest a bit of a re-organization of the series (the
patches are fine, just the order):
- introduce vsock_set_rcvlowat()
- disabling it for hv_sock
- use 'target' in virtio transports
- use 'target' in vmci transports
- use sock_rcvlowat in vsock_poll()
?I think is better to pass sock_rcvlowat() as 'target' when the
transports are already able to use it
- add vsock_data_ready()
- use vsock_data_ready() in virtio transports
- use vsock_data_ready() in vmci transports
- tests

What do you think?

Thanks,
Stefano

2022-07-28 06:07:26

by Arseniy Krasnov

[permalink] [raw]
Subject: Re: [RFC PATCH v2 1/9] vsock: use sk_rcvlowat to set POLLIN/POLLRDNORM

On 27.07.2022 15:17, Stefano Garzarella wrote:
> On Mon, Jul 25, 2022 at 07:56:59AM +0000, Arseniy Krasnov wrote:
>> Both bits indicate, that next data read call won't be blocked, but when
>> sk_rcvlowat is not 1, these bits will be set by poll anyway, thus when
>> user tries to dequeue data,it will wait until sk_rcvlowat bytes of data
>> will be available.
>>
>
> The patch LGTM, but I suggest you to rewrite the title and commit of the message to better explain what this patch does (pass sock_rcvlowat to notify_poll_in as target) and then explain why as you already did (to set POLLIN/POLLRDNORM only when target is reached).
Ok, i see. Ack
>
> Thanks,
> Stefano
>
>> Signed-off-by: Arseniy Krasnov <[email protected]>
>> ---
>> net/vmw_vsock/af_vsock.c | 3 ++-
>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
>> index f04abf662ec6..63a13fa2686a 100644
>> --- a/net/vmw_vsock/af_vsock.c
>> +++ b/net/vmw_vsock/af_vsock.c
>> @@ -1066,8 +1066,9 @@ static __poll_t vsock_poll(struct file *file, struct socket *sock,
>>         if (transport && transport->stream_is_active(vsk) &&
>>             !(sk->sk_shutdown & RCV_SHUTDOWN)) {
>>             bool data_ready_now = false;
>> +            int target = sock_rcvlowat(sk, 0, INT_MAX);
>>             int ret = transport->notify_poll_in(
>> -                    vsk, 1, &data_ready_now);
>> +                    vsk, target, &data_ready_now);
>>             if (ret < 0) {
>>                 mask |= EPOLLERR;
>>             } else {
>> -- 
>> 2.25.1
>

2022-07-28 06:19:57

by Arseniy Krasnov

[permalink] [raw]
Subject: Re: [RFC PATCH v2 3/9] vmci/vsock: use 'target' in notify_poll_in, callback

On 27.07.2022 15:22, Stefano Garzarella wrote:
> @Jorgen can you take a look at this series, especially this patch?
>
> Maybe we need to update the comments in the else branch, something like
> s/there is nothing/there is not enough data
Ok, thanks!
>
> Thanks,
> Stefano
>
> On Mon, Jul 25, 2022 at 08:01:01AM +0000, Arseniy Krasnov wrote:
>> This callback controls setting of POLLIN,POLLRDNORM output bits of poll()
>> syscall,but in some cases,it is incorrectly to set it, when socket has
>> at least 1 bytes of available data. Use 'target' which is already exists
>> and equal to sk_rcvlowat in this case.
>>
>> Signed-off-by: Arseniy Krasnov <[email protected]>
>> ---
>> net/vmw_vsock/vmci_transport_notify.c        | 2 +-
>> net/vmw_vsock/vmci_transport_notify_qstate.c | 2 +-
>> 2 files changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/net/vmw_vsock/vmci_transport_notify.c b/net/vmw_vsock/vmci_transport_notify.c
>> index d69fc4b595ad..1684b85b0660 100644
>> --- a/net/vmw_vsock/vmci_transport_notify.c
>> +++ b/net/vmw_vsock/vmci_transport_notify.c
>> @@ -340,7 +340,7 @@ vmci_transport_notify_pkt_poll_in(struct sock *sk,
>> {
>>     struct vsock_sock *vsk = vsock_sk(sk);
>>
>> -    if (vsock_stream_has_data(vsk)) {
>> +    if (vsock_stream_has_data(vsk) >= target) {
>>         *data_ready_now = true;
>>     } else {
>>         /* We can't read right now because there is nothing in the
>> diff --git a/net/vmw_vsock/vmci_transport_notify_qstate.c b/net/vmw_vsock/vmci_transport_notify_qstate.c
>> index 0f36d7c45db3..a40407872b53 100644
>> --- a/net/vmw_vsock/vmci_transport_notify_qstate.c
>> +++ b/net/vmw_vsock/vmci_transport_notify_qstate.c
>> @@ -161,7 +161,7 @@ vmci_transport_notify_pkt_poll_in(struct sock *sk,
>> {
>>     struct vsock_sock *vsk = vsock_sk(sk);
>>
>> -    if (vsock_stream_has_data(vsk)) {
>> +    if (vsock_stream_has_data(vsk) >= target) {
>>         *data_ready_now = true;
>>     } else {
>>         /* We can't read right now because there is nothing in the
>> -- 
>> 2.25.1
>

2022-07-28 06:20:06

by Arseniy Krasnov

[permalink] [raw]
Subject: Re: [RFC PATCH v2 5/9] vsock: SO_RCVLOWAT transport set callback

On 27.07.2022 15:24, Stefano Garzarella wrote:
> On Mon, Jul 25, 2022 at 08:05:28AM +0000, Arseniy Krasnov wrote:
>> This adds transport specific callback for SO_RCVLOWAT, because in some
>> transports it may be difficult to know current available number of bytes
>> ready to read. Thus, when SO_RCVLOWAT is set, transport may reject it.
>>
>> Signed-off-by: Arseniy Krasnov <[email protected]>
>> ---
>> include/net/af_vsock.h   |  1 +
>> net/vmw_vsock/af_vsock.c | 19 +++++++++++++++++++
>> 2 files changed, 20 insertions(+)
>>
>> diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
>> index f742e50207fb..eae5874bae35 100644
>> --- a/include/net/af_vsock.h
>> +++ b/include/net/af_vsock.h
>> @@ -134,6 +134,7 @@ struct vsock_transport {
>>     u64 (*stream_rcvhiwat)(struct vsock_sock *);
>>     bool (*stream_is_active)(struct vsock_sock *);
>>     bool (*stream_allow)(u32 cid, u32 port);
>> +    int (*set_rcvlowat)(struct vsock_sock *, int);
>>
>>     /* SEQ_PACKET. */
>>     ssize_t (*seqpacket_dequeue)(struct vsock_sock *vsk, struct msghdr *msg,
>> diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
>> index 63a13fa2686a..b7a286db4af1 100644
>> --- a/net/vmw_vsock/af_vsock.c
>> +++ b/net/vmw_vsock/af_vsock.c
>> @@ -2130,6 +2130,24 @@ vsock_connectible_recvmsg(struct socket *sock, struct msghdr *msg, size_t len,
>>     return err;
>> }
>>
>> +static int vsock_set_rcvlowat(struct sock *sk, int val)
>> +{
>> +    const struct vsock_transport *transport;
>> +    struct vsock_sock *vsk;
>> +    int err = 0;
>> +
>> +    vsk = vsock_sk(sk);
>> +    transport = vsk->transport;
>
> `transport` can be NULL if the user call SO_RCVLOWAT before we assign it, so we should check it.
Ack
>
> I think if the transport implements `set_rcvlowat`, maybe we should set there sk->sk_rcvlowat, so I would do something like that:
>
>     if (transport && transport->set_rcvlowat)
>         err = transport->set_rcvlowat(vsk, val);
>     else
>         WRITE_ONCE(sk->sk_rcvlowat, val ? : 1);
>
>     return err;
>
> In addition I think we should check that val does not exceed vsk->buffer_size, something similar of what tcp_set_rcvlowat() does.
>
Ack
> Thanks,
> Stefano
>

2022-07-28 06:29:25

by Arseniy Krasnov

[permalink] [raw]
Subject: Re: [RFC PATCH v2 0/9] vsock: updates for SO_RCVLOWAT handling

On 27.07.2022 15:37, Stefano Garzarella wrote:
> Hi Arseniy,
>
> On Mon, Jul 25, 2022 at 07:54:05AM +0000, Arseniy Krasnov wrote:
>> Hello,
>>
>> This patchset includes some updates for SO_RCVLOWAT:
>>
>> 1) af_vsock:
>>   During my experiments with zerocopy receive, i found, that in some
>>   cases, poll() implementation violates POSIX: when socket has non-
>>   default SO_RCVLOWAT(e.g. not 1), poll() will always set POLLIN and
>>   POLLRDNORM bits in 'revents' even number of bytes available to read
>>   on socket is smaller than SO_RCVLOWAT value. In this case,user sees
>>   POLLIN flag and then tries to read data(for example using  'read()'
>>   call), but read call will be blocked, because  SO_RCVLOWAT logic is
>>   supported in dequeue loop in af_vsock.c. But the same time,  POSIX
>>   requires that:
>>
>>   "POLLIN     Data other than high-priority data may be read without
>>               blocking.
>>    POLLRDNORM Normal data may be read without blocking."
>>
>>   See https://www.open-std.org/jtc1/sc22/open/n4217.pdf, page 293.
>>
>>   So, we have, that poll() syscall returns POLLIN, but read call will
>>   be blocked.
>>
>>   Also in man page socket(7) i found that:
>>
>>   "Since Linux 2.6.28, select(2), poll(2), and epoll(7) indicate a
>>   socket as readable only if at least SO_RCVLOWAT bytes are available."
>>
>>   I checked TCP callback for poll()(net/ipv4/tcp.c, tcp_poll()), it
>>   uses SO_RCVLOWAT value to set POLLIN bit, also i've tested TCP with
>>   this case for TCP socket, it works as POSIX required.
>>
>>   I've added some fixes to af_vsock.c and virtio_transport_common.c,
>>   test is also implemented.
>>
>> 2) virtio/vsock:
>>   It adds some optimization to wake ups, when new data arrived. Now,
>>   SO_RCVLOWAT is considered before wake up sleepers who wait new data.
>>   There is no sense, to kick waiter, when number of available bytes
>>   in socket's queue < SO_RCVLOWAT, because if we wake up reader in
>>   this case, it will wait for SO_RCVLOWAT data anyway during dequeue,
>>   or in poll() case, POLLIN/POLLRDNORM bits won't be set, so such
>>   exit from poll() will be "spurious". This logic is also used in TCP
>>   sockets.
>
> Nice, it looks good!
Thank You!
>
>>
>> 3) vmci/vsock:
>>   Same as 2), but i'm not sure about this changes. Will be very good,
>>   to get comments from someone who knows this code.
>
> I CCed VMCI maintainers to the patch and also to this cover, maybe better to keep them in the loop for next versions.
>
> (Jorgen's and Rajesh's emails bounced back, so I'm CCing here only Bryan, Vishnu, and [email protected])
Ok, i'll CC them in the next version
>
>>
>> 4) Hyper-V:
>>   As Dexuan Cui mentioned, for Hyper-V transport it is difficult to
>>   support SO_RCVLOWAT, so he suggested to disable this feature for
>>   Hyper-V.
>
> I left a couple of comments in some patches, but it seems to me to be in a good state :-)
>
> I would just suggest a bit of a re-organization of the series (the patches are fine, just the order):
>   - introduce vsock_set_rcvlowat()
>   - disabling it for hv_sock
>   - use 'target' in virtio transports
>   - use 'target' in vmci transports
>   - use sock_rcvlowat in vsock_poll()
>     I think is better to pass sock_rcvlowat() as 'target' when the
>     transports are already able to use it
>   - add vsock_data_ready()
>   - use vsock_data_ready() in virtio transports
>   - use vsock_data_ready() in vmci transports
>   - tests
>
> What do you think?
No problem! I think i can wait for reply from VMWare guys before preparing v3
>
> Thanks,
> Stefano
>

2022-08-02 05:33:17

by Vishnu Dasa

[permalink] [raw]
Subject: Re: [RFC PATCH v2 0/9] vsock: updates for SO_RCVLOWAT handling



> On Jul 27, 2022, at 11:08 PM, Arseniy Krasnov <[email protected]> wrote:
>
> On 27.07.2022 15:37, Stefano Garzarella wrote:
>> Hi Arseniy,
>>
>> On Mon, Jul 25, 2022 at 07:54:05AM +0000, Arseniy Krasnov wrote:
>>> Hello,
>>>
>>> This patchset includes some updates for SO_RCVLOWAT:
>>>
>>> 1) af_vsock:
>>> During my experiments with zerocopy receive, i found, that in some
>>> cases, poll() implementation violates POSIX: when socket has non-
>>> default SO_RCVLOWAT(e.g. not 1), poll() will always set POLLIN and
>>> POLLRDNORM bits in 'revents' even number of bytes available to read
>>> on socket is smaller than SO_RCVLOWAT value. In this case,user sees
>>> POLLIN flag and then tries to read data(for example using 'read()'
>>> call), but read call will be blocked, because SO_RCVLOWAT logic is
>>> supported in dequeue loop in af_vsock.c. But the same time, POSIX
>>> requires that:
>>>
>>> "POLLIN Data other than high-priority data may be read without
>>> blocking.
>>> POLLRDNORM Normal data may be read without blocking."
>>>
>>> See https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.open-std.org%2Fjtc1%2Fsc22%2Fopen%2Fn4217.pdf&amp;data=05%7C01%7Cvdasa%40vmware.com%7Cae83621d8709421de14b08da705faa9c%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C1%7C637945853473740235%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=NrbycCcVXV9Tz8NRDYBpnDx7KpFF6BZpSRbuhz1IfJ4%3D&amp;reserved=0, page 293.
>>>
>>> So, we have, that poll() syscall returns POLLIN, but read call will
>>> be blocked.
>>>
>>> Also in man page socket(7) i found that:
>>>
>>> "Since Linux 2.6.28, select(2), poll(2), and epoll(7) indicate a
>>> socket as readable only if at least SO_RCVLOWAT bytes are available."
>>>
>>> I checked TCP callback for poll()(net/ipv4/tcp.c, tcp_poll()), it
>>> uses SO_RCVLOWAT value to set POLLIN bit, also i've tested TCP with
>>> this case for TCP socket, it works as POSIX required.
>>>
>>> I've added some fixes to af_vsock.c and virtio_transport_common.c,
>>> test is also implemented.
>>>
>>> 2) virtio/vsock:
>>> It adds some optimization to wake ups, when new data arrived. Now,
>>> SO_RCVLOWAT is considered before wake up sleepers who wait new data.
>>> There is no sense, to kick waiter, when number of available bytes
>>> in socket's queue < SO_RCVLOWAT, because if we wake up reader in
>>> this case, it will wait for SO_RCVLOWAT data anyway during dequeue,
>>> or in poll() case, POLLIN/POLLRDNORM bits won't be set, so such
>>> exit from poll() will be "spurious". This logic is also used in TCP
>>> sockets.
>>
>> Nice, it looks good!
> Thank You!
>>
>>>
>>> 3) vmci/vsock:
>>> Same as 2), but i'm not sure about this changes. Will be very good,
>>> to get comments from someone who knows this code.
>>
>> I CCed VMCI maintainers to the patch and also to this cover, maybe better to keep them in the loop for next versions.
>>
>> (Jorgen's and Rajesh's emails bounced back, so I'm CCing here only Bryan, Vishnu, and [email protected])
> Ok, i'll CC them in the next version
>>
>>>
>>> 4) Hyper-V:
>>> As Dexuan Cui mentioned, for Hyper-V transport it is difficult to
>>> support SO_RCVLOWAT, so he suggested to disable this feature for
>>> Hyper-V.
>>
>> I left a couple of comments in some patches, but it seems to me to be in a good state :-)
>>
>> I would just suggest a bit of a re-organization of the series (the patches are fine, just the order):
>> - introduce vsock_set_rcvlowat()
>> - disabling it for hv_sock
>> - use 'target' in virtio transports
>> - use 'target' in vmci transports
>> - use sock_rcvlowat in vsock_poll()
>> I think is better to pass sock_rcvlowat() as 'target' when the
>> transports are already able to use it
>> - add vsock_data_ready()
>> - use vsock_data_ready() in virtio transports
>> - use vsock_data_ready() in vmci transports
>> - tests
>>
>> What do you think?
> No problem! I think i can wait for reply from VMWare guys before preparing v3

Looks fine to me, especially the VMCI parts. Please send v3, and we can test it
from VMCI point of view as well.

Thanks,
Vishnu

2022-08-02 05:51:58

by Arseniy Krasnov

[permalink] [raw]
Subject: Re: [RFC PATCH v2 0/9] vsock: updates for SO_RCVLOWAT handling

On 02.08.2022 08:31, Vishnu Dasa wrote:
>
>
>> On Jul 27, 2022, at 11:08 PM, Arseniy Krasnov <[email protected]> wrote:
>>
>> On 27.07.2022 15:37, Stefano Garzarella wrote:
>>> Hi Arseniy,
>>>
>>> On Mon, Jul 25, 2022 at 07:54:05AM +0000, Arseniy Krasnov wrote:
>>>> Hello,
>>>>
>>>> This patchset includes some updates for SO_RCVLOWAT:
>>>>
>>>> 1) af_vsock:
>>>> During my experiments with zerocopy receive, i found, that in some
>>>> cases, poll() implementation violates POSIX: when socket has non-
>>>> default SO_RCVLOWAT(e.g. not 1), poll() will always set POLLIN and
>>>> POLLRDNORM bits in 'revents' even number of bytes available to read
>>>> on socket is smaller than SO_RCVLOWAT value. In this case,user sees
>>>> POLLIN flag and then tries to read data(for example using 'read()'
>>>> call), but read call will be blocked, because SO_RCVLOWAT logic is
>>>> supported in dequeue loop in af_vsock.c. But the same time, POSIX
>>>> requires that:
>>>>
>>>> "POLLIN Data other than high-priority data may be read without
>>>> blocking.
>>>> POLLRDNORM Normal data may be read without blocking."
>>>>
>>>> See https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.open-std.org%2Fjtc1%2Fsc22%2Fopen%2Fn4217.pdf&amp;data=05%7C01%7Cvdasa%40vmware.com%7Cae83621d8709421de14b08da705faa9c%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C1%7C637945853473740235%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=NrbycCcVXV9Tz8NRDYBpnDx7KpFF6BZpSRbuhz1IfJ4%3D&amp;reserved=0, page 293.
>>>>
>>>> So, we have, that poll() syscall returns POLLIN, but read call will
>>>> be blocked.
>>>>
>>>> Also in man page socket(7) i found that:
>>>>
>>>> "Since Linux 2.6.28, select(2), poll(2), and epoll(7) indicate a
>>>> socket as readable only if at least SO_RCVLOWAT bytes are available."
>>>>
>>>> I checked TCP callback for poll()(net/ipv4/tcp.c, tcp_poll()), it
>>>> uses SO_RCVLOWAT value to set POLLIN bit, also i've tested TCP with
>>>> this case for TCP socket, it works as POSIX required.
>>>>
>>>> I've added some fixes to af_vsock.c and virtio_transport_common.c,
>>>> test is also implemented.
>>>>
>>>> 2) virtio/vsock:
>>>> It adds some optimization to wake ups, when new data arrived. Now,
>>>> SO_RCVLOWAT is considered before wake up sleepers who wait new data.
>>>> There is no sense, to kick waiter, when number of available bytes
>>>> in socket's queue < SO_RCVLOWAT, because if we wake up reader in
>>>> this case, it will wait for SO_RCVLOWAT data anyway during dequeue,
>>>> or in poll() case, POLLIN/POLLRDNORM bits won't be set, so such
>>>> exit from poll() will be "spurious". This logic is also used in TCP
>>>> sockets.
>>>
>>> Nice, it looks good!
>> Thank You!
>>>
>>>>
>>>> 3) vmci/vsock:
>>>> Same as 2), but i'm not sure about this changes. Will be very good,
>>>> to get comments from someone who knows this code.
>>>
>>> I CCed VMCI maintainers to the patch and also to this cover, maybe better to keep them in the loop for next versions.
>>>
>>> (Jorgen's and Rajesh's emails bounced back, so I'm CCing here only Bryan, Vishnu, and [email protected])
>> Ok, i'll CC them in the next version
>>>
>>>>
>>>> 4) Hyper-V:
>>>> As Dexuan Cui mentioned, for Hyper-V transport it is difficult to
>>>> support SO_RCVLOWAT, so he suggested to disable this feature for
>>>> Hyper-V.
>>>
>>> I left a couple of comments in some patches, but it seems to me to be in a good state :-)
>>>
>>> I would just suggest a bit of a re-organization of the series (the patches are fine, just the order):
>>> - introduce vsock_set_rcvlowat()
>>> - disabling it for hv_sock
>>> - use 'target' in virtio transports
>>> - use 'target' in vmci transports
>>> - use sock_rcvlowat in vsock_poll()
>>> I think is better to pass sock_rcvlowat() as 'target' when the
>>> transports are already able to use it
>>> - add vsock_data_ready()
>>> - use vsock_data_ready() in virtio transports
>>> - use vsock_data_ready() in vmci transports
>>> - tests
>>>
>>> What do you think?
>> No problem! I think i can wait for reply from VMWare guys before preparing v3
>
> Looks fine to me, especially the VMCI parts. Please send v3, and we can test it
> from VMCI point of view as well.
Great, thank you for reply. I'll prepare v3 ASAP and You will be CCed

Thanks,
Arseniy
>
> Thanks,
> Vishnu

2022-08-02 06:11:43

by Vishnu Dasa

[permalink] [raw]
Subject: Re: [RFC PATCH v2 0/9] vsock: updates for SO_RCVLOWAT handling


> On Jul 27, 2022, at 5:37 AM, Stefano Garzarella <[email protected]> wrote:
>
> Hi Arseniy,
>
> On Mon, Jul 25, 2022 at 07:54:05AM +0000, Arseniy Krasnov wrote:
>> Hello,
>>
>> This patchset includes some updates for SO_RCVLOWAT:
>>
>> 1) af_vsock:
>> During my experiments with zerocopy receive, i found, that in some
>> cases, poll() implementation violates POSIX: when socket has non-
>> default SO_RCVLOWAT(e.g. not 1), poll() will always set POLLIN and
>> POLLRDNORM bits in 'revents' even number of bytes available to read
>> on socket is smaller than SO_RCVLOWAT value. In this case,user sees
>> POLLIN flag and then tries to read data(for example using 'read()'
>> call), but read call will be blocked, because SO_RCVLOWAT logic is
>> supported in dequeue loop in af_vsock.c. But the same time, POSIX
>> requires that:
>>
>> "POLLIN Data other than high-priority data may be read without
>> blocking.
>> POLLRDNORM Normal data may be read without blocking."
>>
>> See https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.open-std.org%2Fjtc1%2Fsc22%2Fopen%2Fn4217.pdf&amp;data=05%7C01%7Cvdasa%40vmware.com%7C5ad2c6759fd8439e938708da6fccbee4%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C1%7C637945222450930014%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=60hG3DiYufOCv1DuufSdujiLEKDNou1Ztyah3GPbOLI%3D&amp;reserved=0, page 293.
>>
>> So, we have, that poll() syscall returns POLLIN, but read call will
>> be blocked.
>>
>> Also in man page socket(7) i found that:
>>
>> "Since Linux 2.6.28, select(2), poll(2), and epoll(7) indicate a
>> socket as readable only if at least SO_RCVLOWAT bytes are available."
>>
>> I checked TCP callback for poll()(net/ipv4/tcp.c, tcp_poll()), it
>> uses SO_RCVLOWAT value to set POLLIN bit, also i've tested TCP with
>> this case for TCP socket, it works as POSIX required.
>>
>> I've added some fixes to af_vsock.c and virtio_transport_common.c,
>> test is also implemented.
>>
>> 2) virtio/vsock:
>> It adds some optimization to wake ups, when new data arrived. Now,
>> SO_RCVLOWAT is considered before wake up sleepers who wait new data.
>> There is no sense, to kick waiter, when number of available bytes
>> in socket's queue < SO_RCVLOWAT, because if we wake up reader in
>> this case, it will wait for SO_RCVLOWAT data anyway during dequeue,
>> or in poll() case, POLLIN/POLLRDNORM bits won't be set, so such
>> exit from poll() will be "spurious". This logic is also used in TCP
>> sockets.
>
> Nice, it looks good!
>
>>
>> 3) vmci/vsock:
>> Same as 2), but i'm not sure about this changes. Will be very good,
>> to get comments from someone who knows this code.
>
> I CCed VMCI maintainers to the patch and also to this cover, maybe
> better to keep them in the loop for next versions.
>
> (Jorgen's and Rajesh's emails bounced back, so I'm CCing here only
> Bryan, Vishnu, and [email protected])

Hi Stefano,
Jorgen and Rajesh are no longer with VMware. There's a patch in
flight to remove Rajesh from the MAINTAINERS file (Jorgen is already
removed).

Thanks,
Vishnu

2022-08-02 06:12:06

by Dexuan Cui

[permalink] [raw]
Subject: RE: [RFC PATCH v2 6/9] hv_sock: disable SO_RCVLOWAT support

> From: Arseniy Krasnov <[email protected]>
> Sent: Monday, July 25, 2022 1:07 AM
> ...
> Subject: [RFC PATCH v2 6/9] hv_sock: disable SO_RCVLOWAT support
>
> For Hyper-V it is quiet difficult to support this socket option,due to
> transport internals, so disable it.
>
> Signed-off-by: Arseniy Krasnov <[email protected]>

Thanks, Arseniy! This looks good to me.

Reviewed-by: Dexuan Cui <[email protected]>


2022-08-02 08:21:26

by Stefano Garzarella

[permalink] [raw]
Subject: Re: [RFC PATCH v2 0/9] vsock: updates for SO_RCVLOWAT handling

Hi Vishnu,

On Tue, Aug 02, 2022 at 05:35:22AM +0000, Vishnu Dasa wrote:
>> On Jul 27, 2022, at 5:37 AM, Stefano Garzarella <[email protected]>
>> wrote:
>> Hi Arseniy,
>>
>> On Mon, Jul 25, 2022 at 07:54:05AM +0000, Arseniy Krasnov wrote:

[...]

>>>
>>> 3) vmci/vsock:
>>> Same as 2), but i'm not sure about this changes. Will be very good,
>>> to get comments from someone who knows this code.
>>
>> I CCed VMCI maintainers to the patch and also to this cover, maybe
>> better to keep them in the loop for next versions.
>>
>> (Jorgen's and Rajesh's emails bounced back, so I'm CCing here only
>> Bryan, Vishnu, and [email protected])
>
>Hi Stefano,
>Jorgen and Rajesh are no longer with VMware. There's a patch in
>flight to remove Rajesh from the MAINTAINERS file (Jorgen is already
>removed).

Thanks for the update! I will contact you and Bryan for any questions
with VMCI in the future :-)

Stefano