2022-08-03 14:21:09

by Arseniy Krasnov

[permalink] [raw]
Subject: [RFC PATCH v3 0/9] vsock: updates for SO_RCVLOWAT handling

Hello,

This patchset includes some updates for SO_RCVLOWAT:

1) af_vsock:
During my experiments with zerocopy receive, i found, that in some
cases, poll() implementation violates POSIX: when socket has non-
default SO_RCVLOWAT(e.g. not 1), poll() will always set POLLIN and
POLLRDNORM bits in 'revents' even number of bytes available to read
on socket is smaller than SO_RCVLOWAT value. In this case,user sees
POLLIN flag and then tries to read data(for example using 'read()'
call), but read call will be blocked, because SO_RCVLOWAT logic is
supported in dequeue loop in af_vsock.c. But the same time, POSIX
requires that:

"POLLIN Data other than high-priority data may be read without
blocking.
POLLRDNORM Normal data may be read without blocking."

See https://www.open-std.org/jtc1/sc22/open/n4217.pdf, page 293.

So, we have, that poll() syscall returns POLLIN, but read call will
be blocked.

Also in man page socket(7) i found that:

"Since Linux 2.6.28, select(2), poll(2), and epoll(7) indicate a
socket as readable only if at least SO_RCVLOWAT bytes are available."

I checked TCP callback for poll()(net/ipv4/tcp.c, tcp_poll()), it
uses SO_RCVLOWAT value to set POLLIN bit, also i've tested TCP with
this case for TCP socket, it works as POSIX required.

I've added some fixes to af_vsock.c and virtio_transport_common.c,
test is also implemented.

2) virtio/vsock:
It adds some optimization to wake ups, when new data arrived. Now,
SO_RCVLOWAT is considered before wake up sleepers who wait new data.
There is no sense, to kick waiter, when number of available bytes
in socket's queue < SO_RCVLOWAT, because if we wake up reader in
this case, it will wait for SO_RCVLOWAT data anyway during dequeue,
or in poll() case, POLLIN/POLLRDNORM bits won't be set, so such
exit from poll() will be "spurious". This logic is also used in TCP
sockets.

3) vmci/vsock:
Same as 2), but i'm not sure about this changes. Will be very good,
to get comments from someone who knows this code.

4) Hyper-V:
As Dexuan Cui mentioned, for Hyper-V transport it is difficult to
support SO_RCVLOWAT, so he suggested to disable this feature for
Hyper-V.

Thank You

Arseniy Krasnov(9):
vsock: SO_RCVLOWAT transport set callback
hv_sock: disable SO_RCVLOWAT support
virtio/vsock: use 'target' in notify_poll_in callback
vmci/vsock: use 'target' in notify_poll_in callback
vsock: pass sock_rcvlowat to notify_poll_in as target
vsock: add API call for data ready
virtio/vsock: check SO_RCVLOWAT before wake up reader
vmci/vsock: check SO_RCVLOWAT before wake up reader
vsock_test: POLLIN + SO_RCVLOWAT test

include/net/af_vsock.h | 2 +
net/vmw_vsock/af_vsock.c | 38 +++++++++-
net/vmw_vsock/hyperv_transport.c | 7 ++
net/vmw_vsock/virtio_transport_common.c | 7 +-
net/vmw_vsock/vmci_transport_notify.c | 10 +--
net/vmw_vsock/vmci_transport_notify_qstate.c | 12 +--
tools/testing/vsock/vsock_test.c | 107 +++++++++++++++++++++++++++
7 files changed, 166 insertions(+), 17 deletions(-)

Changelog:

v1 -> v2:
1) Patches for VMCI transport(same as for virtio-vsock).
2) Patches for Hyper-V transport(disabling SO_RCVLOWAT setting).
3) Waiting logic in test was updated(sleep() -> poll()).

v2 -> v3:
1) Patches were reordered.
2) Commit message updated in 0005.
3) Check 'transport' pointer in 0001 for NULL.
4) Check 'value' in 0001 for > buffer_size.

--
2.25.1


2022-08-03 14:22:30

by Arseniy Krasnov

[permalink] [raw]
Subject: [RFC PATCH v3 3/9] virtio/vsock: use 'target' in notify_poll_in callback

This callback controls setting of POLLIN,POLLRDNORM output bits of poll()
syscall,but in some cases,it is incorrectly to set it, when socket has
at least 1 bytes of available data. Use 'target' which is already exists
and equal to sk_rcvlowat in this case.

Signed-off-by: Arseniy Krasnov <[email protected]>
---
net/vmw_vsock/virtio_transport_common.c | 5 +----
1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index ec2c2afbf0d0..8f6356ebcdd1 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -634,10 +634,7 @@ virtio_transport_notify_poll_in(struct vsock_sock *vsk,
size_t target,
bool *data_ready_now)
{
- if (vsock_stream_has_data(vsk))
- *data_ready_now = true;
- else
- *data_ready_now = false;
+ *data_ready_now = vsock_stream_has_data(vsk) >= target;

return 0;
}
--
2.25.1

2022-08-03 14:22:46

by Arseniy Krasnov

[permalink] [raw]
Subject: [RFC PATCH v3 2/9] hv_sock: disable SO_RCVLOWAT support

For Hyper-V it is quiet difficult to support this socket option,due to
transport internals, so disable it.

Signed-off-by: Arseniy Krasnov <[email protected]>
Reviewed-by: Dexuan Cui <[email protected]>
---
net/vmw_vsock/hyperv_transport.c | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/net/vmw_vsock/hyperv_transport.c b/net/vmw_vsock/hyperv_transport.c
index e111e13b6660..5fab8f356a86 100644
--- a/net/vmw_vsock/hyperv_transport.c
+++ b/net/vmw_vsock/hyperv_transport.c
@@ -802,6 +802,12 @@ int hvs_notify_send_post_enqueue(struct vsock_sock *vsk, ssize_t written,
return 0;
}

+static
+int hvs_set_rcvlowat(struct vsock_sock *vsk, int val)
+{
+ return -EOPNOTSUPP;
+}
+
static struct vsock_transport hvs_transport = {
.module = THIS_MODULE,

@@ -837,6 +843,7 @@ static struct vsock_transport hvs_transport = {
.notify_send_pre_enqueue = hvs_notify_send_pre_enqueue,
.notify_send_post_enqueue = hvs_notify_send_post_enqueue,

+ .set_rcvlowat = hvs_set_rcvlowat
};

static bool hvs_check_transport(struct vsock_sock *vsk)
--
2.25.1

2022-08-03 14:23:52

by Arseniy Krasnov

[permalink] [raw]
Subject: [RFC PATCH v3 6/9] vsock: add API call for data ready

This adds 'vsock_data_ready()' which must be called by transport to kick
sleeping data readers. It checks for SO_RCVLOWAT value before waking
user,thus preventing spurious wake ups.Based on 'tcp_data_ready()' logic.

Signed-off-by: Arseniy Krasnov <[email protected]>
---
include/net/af_vsock.h | 1 +
net/vmw_vsock/af_vsock.c | 10 ++++++++++
2 files changed, 11 insertions(+)

diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
index eae5874bae35..7b79fc5164cc 100644
--- a/include/net/af_vsock.h
+++ b/include/net/af_vsock.h
@@ -77,6 +77,7 @@ struct vsock_sock {
s64 vsock_stream_has_data(struct vsock_sock *vsk);
s64 vsock_stream_has_space(struct vsock_sock *vsk);
struct sock *vsock_create_connected(struct sock *parent);
+void vsock_data_ready(struct sock *sk);

/**** TRANSPORT ****/

diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index 3a1426eb8baa..47e80a7cbbdf 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -882,6 +882,16 @@ s64 vsock_stream_has_space(struct vsock_sock *vsk)
}
EXPORT_SYMBOL_GPL(vsock_stream_has_space);

+void vsock_data_ready(struct sock *sk)
+{
+ struct vsock_sock *vsk = vsock_sk(sk);
+
+ if (vsock_stream_has_data(vsk) >= sk->sk_rcvlowat ||
+ sock_flag(sk, SOCK_DONE))
+ sk->sk_data_ready(sk);
+}
+EXPORT_SYMBOL_GPL(vsock_data_ready);
+
static int vsock_release(struct socket *sock)
{
__vsock_release(sock->sk, 0);
--
2.25.1

2022-08-03 14:23:57

by Arseniy Krasnov

[permalink] [raw]
Subject: [RFC PATCH v3 7/9] virtio/vsock: check SO_RCVLOWAT before wake up reader

This adds extra condition to wake up data reader: do it only when number
of readable bytes >= SO_RCVLOWAT. Otherwise, there is no sense to kick
user,because it will wait until SO_RCVLOWAT bytes will be dequeued.

Signed-off-by: Arseniy Krasnov <[email protected]>
---
net/vmw_vsock/virtio_transport_common.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index 8f6356ebcdd1..35863132f4f1 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -1081,7 +1081,7 @@ virtio_transport_recv_connected(struct sock *sk,
switch (le16_to_cpu(pkt->hdr.op)) {
case VIRTIO_VSOCK_OP_RW:
virtio_transport_recv_enqueue(vsk, pkt);
- sk->sk_data_ready(sk);
+ vsock_data_ready(sk);
return err;
case VIRTIO_VSOCK_OP_CREDIT_REQUEST:
virtio_transport_send_credit_update(vsk);
--
2.25.1

2022-08-03 14:24:09

by Arseniy Krasnov

[permalink] [raw]
Subject: [RFC PATCH v3 9/9] vsock_test: POLLIN + SO_RCVLOWAT test

This adds test to check,that when poll() returns POLLIN,POLLRDNORM bits,
next read call won't block.

Signed-off-by: Arseniy Krasnov <[email protected]>
---
tools/testing/vsock/vsock_test.c | 107 +++++++++++++++++++++++++++++++
1 file changed, 107 insertions(+)

diff --git a/tools/testing/vsock/vsock_test.c b/tools/testing/vsock/vsock_test.c
index dc577461afc2..920dc5d5d979 100644
--- a/tools/testing/vsock/vsock_test.c
+++ b/tools/testing/vsock/vsock_test.c
@@ -18,6 +18,7 @@
#include <sys/socket.h>
#include <time.h>
#include <sys/mman.h>
+#include <poll.h>

#include "timeout.h"
#include "control.h"
@@ -596,6 +597,107 @@ static void test_seqpacket_invalid_rec_buffer_server(const struct test_opts *opt
close(fd);
}

+static void test_stream_poll_rcvlowat_server(const struct test_opts *opts)
+{
+#define RCVLOWAT_BUF_SIZE 128
+ int fd;
+ int i;
+
+ fd = vsock_stream_accept(VMADDR_CID_ANY, 1234, NULL);
+ if (fd < 0) {
+ perror("accept");
+ exit(EXIT_FAILURE);
+ }
+
+ /* Send 1 byte. */
+ send_byte(fd, 1, 0);
+
+ control_writeln("SRVSENT");
+
+ /* Wait until client is ready to receive rest of data. */
+ control_expectln("CLNSENT");
+
+ for (i = 0; i < RCVLOWAT_BUF_SIZE - 1; i++)
+ send_byte(fd, 1, 0);
+
+ /* Keep socket in active state. */
+ control_expectln("POLLDONE");
+
+ close(fd);
+}
+
+static void test_stream_poll_rcvlowat_client(const struct test_opts *opts)
+{
+ unsigned long lowat_val = RCVLOWAT_BUF_SIZE;
+ char buf[RCVLOWAT_BUF_SIZE];
+ struct pollfd fds;
+ ssize_t read_res;
+ short poll_flags;
+ int fd;
+
+ fd = vsock_stream_connect(opts->peer_cid, 1234);
+ if (fd < 0) {
+ perror("connect");
+ exit(EXIT_FAILURE);
+ }
+
+ if (setsockopt(fd, SOL_SOCKET, SO_RCVLOWAT,
+ &lowat_val, sizeof(lowat_val))) {
+ perror("setsockopt");
+ exit(EXIT_FAILURE);
+ }
+
+ control_expectln("SRVSENT");
+
+ /* At this point, server sent 1 byte. */
+ fds.fd = fd;
+ poll_flags = POLLIN | POLLRDNORM;
+ fds.events = poll_flags;
+
+ /* Try to wait for 1 sec. */
+ if (poll(&fds, 1, 1000) < 0) {
+ perror("poll");
+ exit(EXIT_FAILURE);
+ }
+
+ /* poll() must return nothing. */
+ if (fds.revents) {
+ fprintf(stderr, "Unexpected poll result %hx\n",
+ fds.revents);
+ exit(EXIT_FAILURE);
+ }
+
+ /* Tell server to send rest of data. */
+ control_writeln("CLNSENT");
+
+ /* Poll for data. */
+ if (poll(&fds, 1, 10000) < 0) {
+ perror("poll");
+ exit(EXIT_FAILURE);
+ }
+
+ /* Only these two bits are expected. */
+ if (fds.revents != poll_flags) {
+ fprintf(stderr, "Unexpected poll result %hx\n",
+ fds.revents);
+ exit(EXIT_FAILURE);
+ }
+
+ /* Use MSG_DONTWAIT, if call is going to wait, EAGAIN
+ * will be returned.
+ */
+ read_res = recv(fd, buf, sizeof(buf), MSG_DONTWAIT);
+ if (read_res != RCVLOWAT_BUF_SIZE) {
+ fprintf(stderr, "Unexpected recv result %zi\n",
+ read_res);
+ exit(EXIT_FAILURE);
+ }
+
+ control_writeln("POLLDONE");
+
+ close(fd);
+}
+
static struct test_case test_cases[] = {
{
.name = "SOCK_STREAM connection reset",
@@ -646,6 +748,11 @@ static struct test_case test_cases[] = {
.run_client = test_seqpacket_invalid_rec_buffer_client,
.run_server = test_seqpacket_invalid_rec_buffer_server,
},
+ {
+ .name = "SOCK_STREAM poll() + SO_RCVLOWAT",
+ .run_client = test_stream_poll_rcvlowat_client,
+ .run_server = test_stream_poll_rcvlowat_server,
+ },
{},
};

--
2.25.1

2022-08-03 14:33:51

by Arseniy Krasnov

[permalink] [raw]
Subject: [RFC PATCH v3 5/9] vsock: pass sock_rcvlowat to notify_poll_in as target

This callback controls setting of POLLIN,POLLRDNORM output bits of poll()
syscall,but in some cases,it is incorrectly to set it, when socket has
at least 1 bytes of available data.

Signed-off-by: Arseniy Krasnov <[email protected]>
---
net/vmw_vsock/af_vsock.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index 016ad5ff78b7..3a1426eb8baa 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -1066,8 +1066,9 @@ static __poll_t vsock_poll(struct file *file, struct socket *sock,
if (transport && transport->stream_is_active(vsk) &&
!(sk->sk_shutdown & RCV_SHUTDOWN)) {
bool data_ready_now = false;
+ int target = sock_rcvlowat(sk, 0, INT_MAX);
int ret = transport->notify_poll_in(
- vsk, 1, &data_ready_now);
+ vsk, target, &data_ready_now);
if (ret < 0) {
mask |= EPOLLERR;
} else {
--
2.25.1

2022-08-03 14:40:53

by Arseniy Krasnov

[permalink] [raw]
Subject: [RFC PATCH v3 8/9] vmci/vsock: check SO_RCVLOWAT before wake up reader

This adds extra condition to wake up data reader: do it only when number
of readable bytes >= SO_RCVLOWAT. Otherwise, there is no sense to kick
user,because it will wait until SO_RCVLOWAT bytes will be dequeued.

Signed-off-by: Arseniy Krasnov <[email protected]>
---
net/vmw_vsock/vmci_transport_notify.c | 2 +-
net/vmw_vsock/vmci_transport_notify_qstate.c | 4 ++--
2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/vmw_vsock/vmci_transport_notify.c b/net/vmw_vsock/vmci_transport_notify.c
index 852097e2b9e6..7c3a7db134b2 100644
--- a/net/vmw_vsock/vmci_transport_notify.c
+++ b/net/vmw_vsock/vmci_transport_notify.c
@@ -307,7 +307,7 @@ vmci_transport_handle_wrote(struct sock *sk,
struct vsock_sock *vsk = vsock_sk(sk);
PKT_FIELD(vsk, sent_waiting_read) = false;
#endif
- sk->sk_data_ready(sk);
+ vsock_data_ready(sk);
}

static void vmci_transport_notify_pkt_socket_init(struct sock *sk)
diff --git a/net/vmw_vsock/vmci_transport_notify_qstate.c b/net/vmw_vsock/vmci_transport_notify_qstate.c
index 12f0cb8fe998..e96a88d850a8 100644
--- a/net/vmw_vsock/vmci_transport_notify_qstate.c
+++ b/net/vmw_vsock/vmci_transport_notify_qstate.c
@@ -84,7 +84,7 @@ vmci_transport_handle_wrote(struct sock *sk,
bool bottom_half,
struct sockaddr_vm *dst, struct sockaddr_vm *src)
{
- sk->sk_data_ready(sk);
+ vsock_data_ready(sk);
}

static void vsock_block_update_write_window(struct sock *sk)
@@ -282,7 +282,7 @@ vmci_transport_notify_pkt_recv_post_dequeue(
/* See the comment in
* vmci_transport_notify_pkt_send_post_enqueue().
*/
- sk->sk_data_ready(sk);
+ vsock_data_ready(sk);
}

return err;
--
2.25.1

2022-08-05 16:29:22

by Stefano Garzarella

[permalink] [raw]
Subject: Re: [RFC PATCH v3 0/9] vsock: updates for SO_RCVLOWAT handling

Hi Arseniy,
sorry but I didn't have time to review this series. I will definitely do
it next Monday!

Have a nice weekend,
Stefano

On Wed, Aug 03, 2022 at 01:48:06PM +0000, Arseniy Krasnov wrote:
>Hello,
>
>This patchset includes some updates for SO_RCVLOWAT:
>
>1) af_vsock:
> During my experiments with zerocopy receive, i found, that in some
> cases, poll() implementation violates POSIX: when socket has non-
> default SO_RCVLOWAT(e.g. not 1), poll() will always set POLLIN and
> POLLRDNORM bits in 'revents' even number of bytes available to read
> on socket is smaller than SO_RCVLOWAT value. In this case,user sees
> POLLIN flag and then tries to read data(for example using 'read()'
> call), but read call will be blocked, because SO_RCVLOWAT logic is
> supported in dequeue loop in af_vsock.c. But the same time, POSIX
> requires that:
>
> "POLLIN Data other than high-priority data may be read without
> blocking.
> POLLRDNORM Normal data may be read without blocking."
>
> See https://www.open-std.org/jtc1/sc22/open/n4217.pdf, page 293.
>
> So, we have, that poll() syscall returns POLLIN, but read call will
> be blocked.
>
> Also in man page socket(7) i found that:
>
> "Since Linux 2.6.28, select(2), poll(2), and epoll(7) indicate a
> socket as readable only if at least SO_RCVLOWAT bytes are available."
>
> I checked TCP callback for poll()(net/ipv4/tcp.c, tcp_poll()), it
> uses SO_RCVLOWAT value to set POLLIN bit, also i've tested TCP with
> this case for TCP socket, it works as POSIX required.
>
> I've added some fixes to af_vsock.c and virtio_transport_common.c,
> test is also implemented.
>
>2) virtio/vsock:
> It adds some optimization to wake ups, when new data arrived. Now,
> SO_RCVLOWAT is considered before wake up sleepers who wait new data.
> There is no sense, to kick waiter, when number of available bytes
> in socket's queue < SO_RCVLOWAT, because if we wake up reader in
> this case, it will wait for SO_RCVLOWAT data anyway during dequeue,
> or in poll() case, POLLIN/POLLRDNORM bits won't be set, so such
> exit from poll() will be "spurious". This logic is also used in TCP
> sockets.
>
>3) vmci/vsock:
> Same as 2), but i'm not sure about this changes. Will be very good,
> to get comments from someone who knows this code.
>
>4) Hyper-V:
> As Dexuan Cui mentioned, for Hyper-V transport it is difficult to
> support SO_RCVLOWAT, so he suggested to disable this feature for
> Hyper-V.
>
>Thank You
>
>Arseniy Krasnov(9):
> vsock: SO_RCVLOWAT transport set callback
> hv_sock: disable SO_RCVLOWAT support
> virtio/vsock: use 'target' in notify_poll_in callback
> vmci/vsock: use 'target' in notify_poll_in callback
> vsock: pass sock_rcvlowat to notify_poll_in as target
> vsock: add API call for data ready
> virtio/vsock: check SO_RCVLOWAT before wake up reader
> vmci/vsock: check SO_RCVLOWAT before wake up reader
> vsock_test: POLLIN + SO_RCVLOWAT test
>
> include/net/af_vsock.h | 2 +
> net/vmw_vsock/af_vsock.c | 38 +++++++++-
> net/vmw_vsock/hyperv_transport.c | 7 ++
> net/vmw_vsock/virtio_transport_common.c | 7 +-
> net/vmw_vsock/vmci_transport_notify.c | 10 +--
> net/vmw_vsock/vmci_transport_notify_qstate.c | 12 +--
> tools/testing/vsock/vsock_test.c | 107 +++++++++++++++++++++++++++
> 7 files changed, 166 insertions(+), 17 deletions(-)
>
> Changelog:
>
> v1 -> v2:
> 1) Patches for VMCI transport(same as for virtio-vsock).
> 2) Patches for Hyper-V transport(disabling SO_RCVLOWAT setting).
> 3) Waiting logic in test was updated(sleep() -> poll()).
>
> v2 -> v3:
> 1) Patches were reordered.
> 2) Commit message updated in 0005.
> 3) Check 'transport' pointer in 0001 for NULL.
> 4) Check 'value' in 0001 for > buffer_size.
>
>--
>2.25.1

2022-08-08 06:39:56

by Arseniy Krasnov

[permalink] [raw]
Subject: Re: [RFC PATCH v3 0/9] vsock: updates for SO_RCVLOWAT handling

On 05.08.2022 19:05, Stefano Garzarella wrote:
> Hi Arseniy,
> sorry but I didn't have time to review this series. I will definitely do it next Monday!
>
> Have a nice weekend,
> Stefano
Hello,
no problem

Thank You
>
> On Wed, Aug 03, 2022 at 01:48:06PM +0000, Arseniy Krasnov wrote:
>> Hello,
>>
>> This patchset includes some updates for SO_RCVLOWAT:
>>
>> 1) af_vsock:
>>   During my experiments with zerocopy receive, i found, that in some
>>   cases, poll() implementation violates POSIX: when socket has non-
>>   default SO_RCVLOWAT(e.g. not 1), poll() will always set POLLIN and
>>   POLLRDNORM bits in 'revents' even number of bytes available to read
>>   on socket is smaller than SO_RCVLOWAT value. In this case,user sees
>>   POLLIN flag and then tries to read data(for example using  'read()'
>>   call), but read call will be blocked, because  SO_RCVLOWAT logic is
>>   supported in dequeue loop in af_vsock.c. But the same time,  POSIX
>>   requires that:
>>
>>   "POLLIN     Data other than high-priority data may be read without
>>               blocking.
>>    POLLRDNORM Normal data may be read without blocking."
>>
>>   See https://www.open-std.org/jtc1/sc22/open/n4217.pdf, page 293.
>>
>>   So, we have, that poll() syscall returns POLLIN, but read call will
>>   be blocked.
>>
>>   Also in man page socket(7) i found that:
>>
>>   "Since Linux 2.6.28, select(2), poll(2), and epoll(7) indicate a
>>   socket as readable only if at least SO_RCVLOWAT bytes are available."
>>
>>   I checked TCP callback for poll()(net/ipv4/tcp.c, tcp_poll()), it
>>   uses SO_RCVLOWAT value to set POLLIN bit, also i've tested TCP with
>>   this case for TCP socket, it works as POSIX required.
>>
>>   I've added some fixes to af_vsock.c and virtio_transport_common.c,
>>   test is also implemented.
>>
>> 2) virtio/vsock:
>>   It adds some optimization to wake ups, when new data arrived. Now,
>>   SO_RCVLOWAT is considered before wake up sleepers who wait new data.
>>   There is no sense, to kick waiter, when number of available bytes
>>   in socket's queue < SO_RCVLOWAT, because if we wake up reader in
>>   this case, it will wait for SO_RCVLOWAT data anyway during dequeue,
>>   or in poll() case, POLLIN/POLLRDNORM bits won't be set, so such
>>   exit from poll() will be "spurious". This logic is also used in TCP
>>   sockets.
>>
>> 3) vmci/vsock:
>>   Same as 2), but i'm not sure about this changes. Will be very good,
>>   to get comments from someone who knows this code.
>>
>> 4) Hyper-V:
>>   As Dexuan Cui mentioned, for Hyper-V transport it is difficult to
>>   support SO_RCVLOWAT, so he suggested to disable this feature for
>>   Hyper-V.
>>
>> Thank You
>>
>> Arseniy Krasnov(9):
>> vsock: SO_RCVLOWAT transport set callback
>> hv_sock: disable SO_RCVLOWAT support
>> virtio/vsock: use 'target' in notify_poll_in callback
>> vmci/vsock: use 'target' in notify_poll_in callback
>> vsock: pass sock_rcvlowat to notify_poll_in as target
>> vsock: add API call for data ready
>> virtio/vsock: check SO_RCVLOWAT before wake up reader
>> vmci/vsock: check SO_RCVLOWAT before wake up reader
>> vsock_test: POLLIN + SO_RCVLOWAT test
>>
>> include/net/af_vsock.h                       |   2 +
>> net/vmw_vsock/af_vsock.c                     |  38 +++++++++-
>> net/vmw_vsock/hyperv_transport.c             |   7 ++
>> net/vmw_vsock/virtio_transport_common.c      |   7 +-
>> net/vmw_vsock/vmci_transport_notify.c        |  10 +--
>> net/vmw_vsock/vmci_transport_notify_qstate.c |  12 +--
>> tools/testing/vsock/vsock_test.c             | 107 +++++++++++++++++++++++++++
>> 7 files changed, 166 insertions(+), 17 deletions(-)
>>
>> Changelog:
>>
>> v1 -> v2:
>> 1) Patches for VMCI transport(same as for virtio-vsock).
>> 2) Patches for Hyper-V transport(disabling SO_RCVLOWAT setting).
>> 3) Waiting logic in test was updated(sleep() -> poll()).
>>
>> v2 -> v3:
>> 1) Patches were reordered.
>> 2) Commit message updated in 0005.
>> 3) Check 'transport' pointer in 0001 for NULL.
>> 4) Check 'value' in 0001 for > buffer_size.
>>
>> -- 
>> 2.25.1
>

2022-08-08 10:35:10

by Stefano Garzarella

[permalink] [raw]
Subject: Re: [RFC PATCH v3 2/9] hv_sock: disable SO_RCVLOWAT support

On Wed, Aug 03, 2022 at 01:53:23PM +0000, Arseniy Krasnov wrote:
>For Hyper-V it is quiet difficult to support this socket option,due to
>transport internals, so disable it.
>
>Signed-off-by: Arseniy Krasnov <[email protected]>
>Reviewed-by: Dexuan Cui <[email protected]>
>---
> net/vmw_vsock/hyperv_transport.c | 7 +++++++
> 1 file changed, 7 insertions(+)

Reviewed-by: Stefano Garzarella <[email protected]>

2022-08-08 11:00:46

by Stefano Garzarella

[permalink] [raw]
Subject: Re: [RFC PATCH v3 3/9] virtio/vsock: use 'target' in notify_poll_in callback

On Wed, Aug 03, 2022 at 01:55:36PM +0000, Arseniy Krasnov wrote:
>This callback controls setting of POLLIN,POLLRDNORM output bits of poll()
>syscall,but in some cases,it is incorrectly to set it, when socket has
>at least 1 bytes of available data. Use 'target' which is already exists
>and equal to sk_rcvlowat in this case.

Little suggestion:
We should update the commit description, since so far 'target' is not
equal to sk_rcvlowat.

With that fixed (and adding some spaces after the commas):

Reviewed-by: Stefano Garzarella <[email protected]>

>
>Signed-off-by: Arseniy Krasnov <[email protected]>
>---
> net/vmw_vsock/virtio_transport_common.c | 5 +----
> 1 file changed, 1 insertion(+), 4 deletions(-)
>
>diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
>index ec2c2afbf0d0..8f6356ebcdd1 100644
>--- a/net/vmw_vsock/virtio_transport_common.c
>+++ b/net/vmw_vsock/virtio_transport_common.c
>@@ -634,10 +634,7 @@ virtio_transport_notify_poll_in(struct vsock_sock *vsk,
> size_t target,
> bool *data_ready_now)
> {
>- if (vsock_stream_has_data(vsk))
>- *data_ready_now = true;
>- else
>- *data_ready_now = false;
>+ *data_ready_now = vsock_stream_has_data(vsk) >= target;
>
> return 0;
> }
>--
>2.25.1

2022-08-08 11:03:44

by Stefano Garzarella

[permalink] [raw]
Subject: Re: [RFC PATCH v3 6/9] vsock: add API call for data ready

On Wed, Aug 03, 2022 at 02:01:57PM +0000, Arseniy Krasnov wrote:
>This adds 'vsock_data_ready()' which must be called by transport to kick
>sleeping data readers. It checks for SO_RCVLOWAT value before waking
>user,thus preventing spurious wake ups.Based on 'tcp_data_ready()' logic.
>

Since it's an RFC, I suggest you add a space after the punctuation. :-)

The patch LGTM:

Reviewed-by: Stefano Garzarella <[email protected]>

>Signed-off-by: Arseniy Krasnov <[email protected]>
>---
> include/net/af_vsock.h | 1 +
> net/vmw_vsock/af_vsock.c | 10 ++++++++++
> 2 files changed, 11 insertions(+)
>
>diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
>index eae5874bae35..7b79fc5164cc 100644
>--- a/include/net/af_vsock.h
>+++ b/include/net/af_vsock.h
>@@ -77,6 +77,7 @@ struct vsock_sock {
> s64 vsock_stream_has_data(struct vsock_sock *vsk);
> s64 vsock_stream_has_space(struct vsock_sock *vsk);
> struct sock *vsock_create_connected(struct sock *parent);
>+void vsock_data_ready(struct sock *sk);
>
> /**** TRANSPORT ****/
>
>diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
>index 3a1426eb8baa..47e80a7cbbdf 100644
>--- a/net/vmw_vsock/af_vsock.c
>+++ b/net/vmw_vsock/af_vsock.c
>@@ -882,6 +882,16 @@ s64 vsock_stream_has_space(struct vsock_sock *vsk)
> }
> EXPORT_SYMBOL_GPL(vsock_stream_has_space);
>
>+void vsock_data_ready(struct sock *sk)
>+{
>+ struct vsock_sock *vsk = vsock_sk(sk);
>+
>+ if (vsock_stream_has_data(vsk) >= sk->sk_rcvlowat ||
>+ sock_flag(sk, SOCK_DONE))
>+ sk->sk_data_ready(sk);
>+}
>+EXPORT_SYMBOL_GPL(vsock_data_ready);
>+
> static int vsock_release(struct socket *sock)
> {
> __vsock_release(sock->sk, 0);
>--
>2.25.1

2022-08-08 11:05:48

by Stefano Garzarella

[permalink] [raw]
Subject: Re: [RFC PATCH v3 7/9] virtio/vsock: check SO_RCVLOWAT before wake up reader

On Wed, Aug 03, 2022 at 02:03:58PM +0000, Arseniy Krasnov wrote:
>This adds extra condition to wake up data reader: do it only when number
>of readable bytes >= SO_RCVLOWAT. Otherwise, there is no sense to kick
>user,because it will wait until SO_RCVLOWAT bytes will be dequeued.

Maybe we can mention that these are done in vsock_data_ready().

Anyway, the patch LGTM:

Reviewed-by: Stefano Garzarella <[email protected]>

>
>Signed-off-by: Arseniy Krasnov <[email protected]>
>---
> net/vmw_vsock/virtio_transport_common.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
>diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
>index 8f6356ebcdd1..35863132f4f1 100644
>--- a/net/vmw_vsock/virtio_transport_common.c
>+++ b/net/vmw_vsock/virtio_transport_common.c
>@@ -1081,7 +1081,7 @@ virtio_transport_recv_connected(struct sock *sk,
> switch (le16_to_cpu(pkt->hdr.op)) {
> case VIRTIO_VSOCK_OP_RW:
> virtio_transport_recv_enqueue(vsk, pkt);
>- sk->sk_data_ready(sk);
>+ vsock_data_ready(sk);
> return err;
> case VIRTIO_VSOCK_OP_CREDIT_REQUEST:
> virtio_transport_send_credit_update(vsk);
>--
>2.25.1

2022-08-08 11:14:48

by Stefano Garzarella

[permalink] [raw]
Subject: Re: [RFC PATCH v3 8/9] vmci/vsock: check SO_RCVLOWAT before wake up reader

On Wed, Aug 03, 2022 at 02:05:52PM +0000, Arseniy Krasnov wrote:
>This adds extra condition to wake up data reader: do it only when number
>of readable bytes >= SO_RCVLOWAT. Otherwise, there is no sense to kick
>user,because it will wait until SO_RCVLOWAT bytes will be dequeued.

Ditto as previous patch.

@Bryan, @Vishnu, plaese, can you review/ack also this patch?

Thanks,
Stefano

>
>Signed-off-by: Arseniy Krasnov <[email protected]>
>---
> net/vmw_vsock/vmci_transport_notify.c | 2 +-
> net/vmw_vsock/vmci_transport_notify_qstate.c | 4 ++--
> 2 files changed, 3 insertions(+), 3 deletions(-)
>
>diff --git a/net/vmw_vsock/vmci_transport_notify.c b/net/vmw_vsock/vmci_transport_notify.c
>index 852097e2b9e6..7c3a7db134b2 100644
>--- a/net/vmw_vsock/vmci_transport_notify.c
>+++ b/net/vmw_vsock/vmci_transport_notify.c
>@@ -307,7 +307,7 @@ vmci_transport_handle_wrote(struct sock *sk,
> struct vsock_sock *vsk = vsock_sk(sk);
> PKT_FIELD(vsk, sent_waiting_read) = false;
> #endif
>- sk->sk_data_ready(sk);
>+ vsock_data_ready(sk);
> }
>
> static void vmci_transport_notify_pkt_socket_init(struct sock *sk)
>diff --git a/net/vmw_vsock/vmci_transport_notify_qstate.c b/net/vmw_vsock/vmci_transport_notify_qstate.c
>index 12f0cb8fe998..e96a88d850a8 100644
>--- a/net/vmw_vsock/vmci_transport_notify_qstate.c
>+++ b/net/vmw_vsock/vmci_transport_notify_qstate.c
>@@ -84,7 +84,7 @@ vmci_transport_handle_wrote(struct sock *sk,
> bool bottom_half,
> struct sockaddr_vm *dst, struct sockaddr_vm *src)
> {
>- sk->sk_data_ready(sk);
>+ vsock_data_ready(sk);
> }
>
> static void vsock_block_update_write_window(struct sock *sk)
>@@ -282,7 +282,7 @@ vmci_transport_notify_pkt_recv_post_dequeue(
> /* See the comment in
> * vmci_transport_notify_pkt_send_post_enqueue().
> */
>- sk->sk_data_ready(sk);
>+ vsock_data_ready(sk);
> }
>
> return err;
>--
>2.25.1

2022-08-08 11:22:48

by Stefano Garzarella

[permalink] [raw]
Subject: Re: [RFC PATCH v3 5/9] vsock: pass sock_rcvlowat to notify_poll_in as target

On Wed, Aug 03, 2022 at 01:59:49PM +0000, Arseniy Krasnov wrote:
>This callback controls setting of POLLIN,POLLRDNORM output bits of poll()
>syscall,but in some cases,it is incorrectly to set it, when socket has
>at least 1 bytes of available data.

I suggest you refrase the description a bit, which should describe what
was the problem and what the patch does, so I was thinking something
like this:

Passing 1 as the target to notify_poll_in(), we don't honor
what the user has set via SO_RCVLOWAT, going to set POLLIN
and POLLRDNORM, even if we don't have the amount of bytes
expected by the user.

Let's use sock_rcvlowat() to get the right target to pass to
notify_poll_in().

Anyway, the patch LGTM:

Reviewed-by: Stefano Garzarella <[email protected]>

>
>Signed-off-by: Arseniy Krasnov <[email protected]>
>---
> net/vmw_vsock/af_vsock.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
>diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
>index 016ad5ff78b7..3a1426eb8baa 100644
>--- a/net/vmw_vsock/af_vsock.c
>+++ b/net/vmw_vsock/af_vsock.c
>@@ -1066,8 +1066,9 @@ static __poll_t vsock_poll(struct file *file, struct socket *sock,
> if (transport && transport->stream_is_active(vsk) &&
> !(sk->sk_shutdown & RCV_SHUTDOWN)) {
> bool data_ready_now = false;
>+ int target = sock_rcvlowat(sk, 0, INT_MAX);
> int ret = transport->notify_poll_in(
>- vsk, 1, &data_ready_now);
>+ vsk, target, &data_ready_now);
> if (ret < 0) {
> mask |= EPOLLERR;
> } else {
>--
>2.25.1

2022-08-08 11:28:44

by Stefano Garzarella

[permalink] [raw]
Subject: Re: [RFC PATCH v3 9/9] vsock_test: POLLIN + SO_RCVLOWAT test

On Wed, Aug 03, 2022 at 02:07:58PM +0000, Arseniy Krasnov wrote:
>This adds test to check,that when poll() returns POLLIN,POLLRDNORM bits,
>next read call won't block.
>
>Signed-off-by: Arseniy Krasnov <[email protected]>
>---
> tools/testing/vsock/vsock_test.c | 107 +++++++++++++++++++++++++++++++
> 1 file changed, 107 insertions(+)
>
>diff --git a/tools/testing/vsock/vsock_test.c b/tools/testing/vsock/vsock_test.c
>index dc577461afc2..920dc5d5d979 100644
>--- a/tools/testing/vsock/vsock_test.c
>+++ b/tools/testing/vsock/vsock_test.c
>@@ -18,6 +18,7 @@
> #include <sys/socket.h>
> #include <time.h>
> #include <sys/mman.h>
>+#include <poll.h>
>
> #include "timeout.h"
> #include "control.h"
>@@ -596,6 +597,107 @@ static void test_seqpacket_invalid_rec_buffer_server(const struct test_opts *opt
> close(fd);
> }
>
>+static void test_stream_poll_rcvlowat_server(const struct test_opts *opts)
>+{
>+#define RCVLOWAT_BUF_SIZE 128
>+ int fd;
>+ int i;
>+
>+ fd = vsock_stream_accept(VMADDR_CID_ANY, 1234, NULL);
>+ if (fd < 0) {
>+ perror("accept");
>+ exit(EXIT_FAILURE);
>+ }
>+
>+ /* Send 1 byte. */
>+ send_byte(fd, 1, 0);
>+
>+ control_writeln("SRVSENT");
>+
>+ /* Wait until client is ready to receive rest of data. */
>+ control_expectln("CLNSENT");
>+
>+ for (i = 0; i < RCVLOWAT_BUF_SIZE - 1; i++)
>+ send_byte(fd, 1, 0);
>+
>+ /* Keep socket in active state. */
>+ control_expectln("POLLDONE");
>+
>+ close(fd);
>+}
>+
>+static void test_stream_poll_rcvlowat_client(const struct test_opts *opts)
>+{
>+ unsigned long lowat_val = RCVLOWAT_BUF_SIZE;
>+ char buf[RCVLOWAT_BUF_SIZE];
>+ struct pollfd fds;
>+ ssize_t read_res;
>+ short poll_flags;
>+ int fd;
>+
>+ fd = vsock_stream_connect(opts->peer_cid, 1234);
>+ if (fd < 0) {
>+ perror("connect");
>+ exit(EXIT_FAILURE);
>+ }
>+
>+ if (setsockopt(fd, SOL_SOCKET, SO_RCVLOWAT,
>+ &lowat_val, sizeof(lowat_val))) {

A small checkpatch warning that you can fix since you have to resend:

CHECK: Alignment should match open parenthesis
#76: FILE: tools/testing/vsock/vsock_test.c:645:
+ if (setsockopt(fd, SOL_SOCKET, SO_RCVLOWAT,
+ &lowat_val, sizeof(lowat_val))) {

total: 0 errors, 0 warnings, 1 checks, 125 lines checked

Thanks,
Stefano

2022-08-08 11:32:04

by Stefano Garzarella

[permalink] [raw]
Subject: Re: [RFC PATCH v3 9/9] vsock_test: POLLIN + SO_RCVLOWAT test

On Wed, Aug 03, 2022 at 02:07:58PM +0000, Arseniy Krasnov wrote:
>This adds test to check,that when poll() returns POLLIN,POLLRDNORM bits,
>next read call won't block.
>
>Signed-off-by: Arseniy Krasnov <[email protected]>
>---
> tools/testing/vsock/vsock_test.c | 107 +++++++++++++++++++++++++++++++
> 1 file changed, 107 insertions(+)
>
>diff --git a/tools/testing/vsock/vsock_test.c b/tools/testing/vsock/vsock_test.c
>index dc577461afc2..920dc5d5d979 100644
>--- a/tools/testing/vsock/vsock_test.c
>+++ b/tools/testing/vsock/vsock_test.c
>@@ -18,6 +18,7 @@
> #include <sys/socket.h>
> #include <time.h>
> #include <sys/mman.h>
>+#include <poll.h>
>
> #include "timeout.h"
> #include "control.h"
>@@ -596,6 +597,107 @@ static void test_seqpacket_invalid_rec_buffer_server(const struct test_opts *opt
> close(fd);
> }
>
>+static void test_stream_poll_rcvlowat_server(const struct test_opts *opts)
>+{
>+#define RCVLOWAT_BUF_SIZE 128

Since we use this macro on both server and client functions, I suggest
to move this define outside the function.

Other than that the test LGTM. I also ran it and everything is fine :-)
thanks for adding it!

Reviewed-by: Stefano Garzarella <[email protected]>

2022-08-08 11:52:24

by Stefano Garzarella

[permalink] [raw]
Subject: Re: [RFC PATCH v3 0/9] vsock: updates for SO_RCVLOWAT handling

Hi Arseniy,

On Wed, Aug 03, 2022 at 01:48:06PM +0000, Arseniy Krasnov wrote:
>Hello,
>
>This patchset includes some updates for SO_RCVLOWAT:

I have reviewed all the patches, run tests and everything seems okay :-)

I left some minor comments and asked Bryan and Vishnu to take a better
look at VMCI patches.

In general I ask you to revisit the patch descriptions a bit (for
example adding a space after punctuation). The next version I think can
be without RFC.

Remember to send it with the net-next tag.
Note: net-next is closed for now since we are in the merge window.
It should re-open in a week (you can check here:
http://vger.kernel.org/~davem/net-next.html).

I'll be on vacation the next 2 weeks (Aug 15 - 28), but I'll try to
check out your patches!

Thanks,
Stefano

2022-08-09 09:22:25

by Arseniy Krasnov

[permalink] [raw]
Subject: Re: [RFC PATCH v3 0/9] vsock: updates for SO_RCVLOWAT handling

On 08.08.2022 14:22, Stefano Garzarella wrote:
> Hi Arseniy,
>
> On Wed, Aug 03, 2022 at 01:48:06PM +0000, Arseniy Krasnov wrote:
>> Hello,
>>
>> This patchset includes some updates for SO_RCVLOWAT:
>
> I have reviewed all the patches, run tests and everything seems okay :-)
Thank You
>
> I left some minor comments and asked Bryan and Vishnu to take a better look at VMCI patches.
Ok, let's wait their reply before v4
>
> In general I ask you to revisit the patch descriptions a bit (for example adding a space after punctuation). The next version I think can be without RFC.
ack
>
> Remember to send it with the net-next tag.
> Note: net-next is closed for now since we are in the merge window.
> It should re-open in a week (you can check here: http://vger.kernel.org/~davem/net-next.html).
>
> I'll be on vacation the next 2 weeks (Aug 15 - 28), but I'll try to check out your patches!
No problem, thanks
>
> Thanks,
> Stefano
>

2022-08-19 04:52:27

by Vishnu Dasa

[permalink] [raw]
Subject: Re: [RFC PATCH v3 8/9] vmci/vsock: check SO_RCVLOWAT before wake up reader



> On Aug 8, 2022, at 4:01 AM, Stefano Garzarella <[email protected]> wrote:
>
> On Wed, Aug 03, 2022 at 02:05:52PM +0000, Arseniy Krasnov wrote:
>> This adds extra condition to wake up data reader: do it only when number
>> of readable bytes >= SO_RCVLOWAT. Otherwise, there is no sense to kick
>> user,because it will wait until SO_RCVLOWAT bytes will be dequeued.

Nit: add space after comma.

> Ditto as previous patch.
>
> @Bryan, @Vishnu, plaese, can you review/ack also this patch?

This patch looks good to me.

Thank you, Arseniy for running the new test with VMCI. I also ran some
of our internal tests successfully with this patch series.

> Thanks,
> Stefano
>
>>
>> Signed-off-by: Arseniy Krasnov <[email protected]>

Reviewed-by: Vishnu Dasa <[email protected]>

>> ---
>> net/vmw_vsock/vmci_transport_notify.c | 2 +-
>> net/vmw_vsock/vmci_transport_notify_qstate.c | 4 ++--
>> 2 files changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/net/vmw_vsock/vmci_transport_notify.c b/net/vmw_vsock/vmci_transport_notify.c
>> index 852097e2b9e6..7c3a7db134b2 100644
>> --- a/net/vmw_vsock/vmci_transport_notify.c
>> +++ b/net/vmw_vsock/vmci_transport_notify.c
>> @@ -307,7 +307,7 @@ vmci_transport_handle_wrote(struct sock *sk,
>> struct vsock_sock *vsk = vsock_sk(sk);
>> PKT_FIELD(vsk, sent_waiting_read) = false;
>> #endif
>> - sk->sk_data_ready(sk);
>> + vsock_data_ready(sk);
>> }
>>
>> static void vmci_transport_notify_pkt_socket_init(struct sock *sk)
>> diff --git a/net/vmw_vsock/vmci_transport_notify_qstate.c b/net/vmw_vsock/vmci_transport_notify_qstate.c
>> index 12f0cb8fe998..e96a88d850a8 100644
>> --- a/net/vmw_vsock/vmci_transport_notify_qstate.c
>> +++ b/net/vmw_vsock/vmci_transport_notify_qstate.c
>> @@ -84,7 +84,7 @@ vmci_transport_handle_wrote(struct sock *sk,
>> bool bottom_half,
>> struct sockaddr_vm *dst, struct sockaddr_vm *src)
>> {
>> - sk->sk_data_ready(sk);
>> + vsock_data_ready(sk);
>> }
>>
>> static void vsock_block_update_write_window(struct sock *sk)
>> @@ -282,7 +282,7 @@ vmci_transport_notify_pkt_recv_post_dequeue(
>> /* See the comment in
>> * vmci_transport_notify_pkt_send_post_enqueue().
>> */
>> - sk->sk_data_ready(sk);
>> + vsock_data_ready(sk);
>> }
>>
>> return err;
>> --
>> 2.25.1