This series adds support for UDP segmentation offload feature
in TUN device according to the VIRTIO specification
Yuri Benditovich (4):
virtio-net: add definitions for host USO feature
virtio-net: add support of UDP segmentation (USO) on the host
tun: define feature bit for USO support
tun: indicate support for USO feature
drivers/net/tun.c | 2 +-
include/linux/virtio_net.h | 5 +++++
include/uapi/linux/if_tun.h | 1 +
include/uapi/linux/virtio_net.h | 2 ++
4 files changed, 9 insertions(+), 1 deletion(-)
--
2.26.3
Define feature bit and GSO type according to the VIRTIO
specification.
Signed-off-by: Yuri Benditovich <[email protected]>
---
include/uapi/linux/virtio_net.h | 2 ++
1 file changed, 2 insertions(+)
diff --git a/include/uapi/linux/virtio_net.h b/include/uapi/linux/virtio_net.h
index 3f55a4215f11..a556ac735d7f 100644
--- a/include/uapi/linux/virtio_net.h
+++ b/include/uapi/linux/virtio_net.h
@@ -57,6 +57,7 @@
* Steering */
#define VIRTIO_NET_F_CTRL_MAC_ADDR 23 /* Set MAC address */
+#define VIRTIO_NET_F_HOST_USO 56 /* Host can handle USO packets */
#define VIRTIO_NET_F_HASH_REPORT 57 /* Supports hash report */
#define VIRTIO_NET_F_RSS 60 /* Supports RSS RX steering */
#define VIRTIO_NET_F_RSC_EXT 61 /* extended coalescing info */
@@ -130,6 +131,7 @@ struct virtio_net_hdr_v1 {
#define VIRTIO_NET_HDR_GSO_TCPV4 1 /* GSO frame, IPv4 TCP (TSO) */
#define VIRTIO_NET_HDR_GSO_UDP 3 /* GSO frame, IPv4 UDP (UFO) */
#define VIRTIO_NET_HDR_GSO_TCPV6 4 /* GSO frame, IPv6 TCP */
+#define VIRTIO_NET_HDR_GSO_UDP_L4 5 /* GSO frame, IPv4 UDP (USO) */
#define VIRTIO_NET_HDR_GSO_ECN 0x80 /* TCP has ECN set */
__u8 gso_type;
__virtio16 hdr_len; /* Ethernet + IP + tcp/udp hdrs */
--
2.26.3
Large UDP packet provided by the guest with GSO type set to
VIRTIO_NET_HDR_GSO_UDP_L4 will be divided to several UDP
packets according to the gso_size field.
Signed-off-by: Yuri Benditovich <[email protected]>
---
include/linux/virtio_net.h | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/include/linux/virtio_net.h b/include/linux/virtio_net.h
index b465f8f3e554..4ecf9a1ca912 100644
--- a/include/linux/virtio_net.h
+++ b/include/linux/virtio_net.h
@@ -51,6 +51,11 @@ static inline int virtio_net_hdr_to_skb(struct sk_buff *skb,
ip_proto = IPPROTO_UDP;
thlen = sizeof(struct udphdr);
break;
+ case VIRTIO_NET_HDR_GSO_UDP_L4:
+ gso_type = SKB_GSO_UDP_L4;
+ ip_proto = IPPROTO_UDP;
+ thlen = sizeof(struct udphdr);
+ break;
default:
return -EINVAL;
}
--
2.26.3
Signed-off-by: Yuri Benditovich <[email protected]>
---
drivers/net/tun.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 84f832806313..a35054f9d941 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -2812,7 +2812,7 @@ static int set_offload(struct tun_struct *tun, unsigned long arg)
arg &= ~(TUN_F_TSO4|TUN_F_TSO6);
}
- arg &= ~TUN_F_UFO;
+ arg &= ~(TUN_F_UFO|TUN_F_USO);
}
/* This gives the user a way to test for new features in future by
--
2.26.3
User mode software can probe this bit to check whether the
USO feature is supported by TUN/TAP device.
Signed-off-by: Yuri Benditovich <[email protected]>
---
include/uapi/linux/if_tun.h | 1 +
1 file changed, 1 insertion(+)
diff --git a/include/uapi/linux/if_tun.h b/include/uapi/linux/if_tun.h
index 454ae31b93c7..24f246920dd5 100644
--- a/include/uapi/linux/if_tun.h
+++ b/include/uapi/linux/if_tun.h
@@ -88,6 +88,7 @@
#define TUN_F_TSO6 0x04 /* I can handle TSO for IPv6 packets */
#define TUN_F_TSO_ECN 0x08 /* I can handle TSO with ECN bits. */
#define TUN_F_UFO 0x10 /* I can handle UFO packets */
+#define TUN_F_USO 0x20 /* I can handle USO packets */
/* Protocol info prepended to the packets (when IFF_NO_PI is not set) */
#define TUN_PKT_STRIP 0x0001
--
2.26.3
在 2021/5/11 下午12:42, Yuri Benditovich 写道:
> Define feature bit and GSO type according to the VIRTIO
> specification.
>
> Signed-off-by: Yuri Benditovich <[email protected]>
> ---
> include/uapi/linux/virtio_net.h | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/include/uapi/linux/virtio_net.h b/include/uapi/linux/virtio_net.h
> index 3f55a4215f11..a556ac735d7f 100644
> --- a/include/uapi/linux/virtio_net.h
> +++ b/include/uapi/linux/virtio_net.h
> @@ -57,6 +57,7 @@
> * Steering */
> #define VIRTIO_NET_F_CTRL_MAC_ADDR 23 /* Set MAC address */
>
> +#define VIRTIO_NET_F_HOST_USO 56 /* Host can handle USO packets */
> #define VIRTIO_NET_F_HASH_REPORT 57 /* Supports hash report */
> #define VIRTIO_NET_F_RSS 60 /* Supports RSS RX steering */
> #define VIRTIO_NET_F_RSC_EXT 61 /* extended coalescing info */
> @@ -130,6 +131,7 @@ struct virtio_net_hdr_v1 {
> #define VIRTIO_NET_HDR_GSO_TCPV4 1 /* GSO frame, IPv4 TCP (TSO) */
> #define VIRTIO_NET_HDR_GSO_UDP 3 /* GSO frame, IPv4 UDP (UFO) */
> #define VIRTIO_NET_HDR_GSO_TCPV6 4 /* GSO frame, IPv6 TCP */
> +#define VIRTIO_NET_HDR_GSO_UDP_L4 5 /* GSO frame, IPv4 UDP (USO) */
This is the gso_type not the feature actually.
I wonder what's the reason for not
1) introducing a dedicated virtio-net feature bit for this
(VIRTIO_NET_F_GUEST_GSO_UDP_L4.
2) toggle the NETIF_F_GSO_UDP_L4 feature for tuntap based on the
negotiated feature.
Thanks
> #define VIRTIO_NET_HDR_GSO_ECN 0x80 /* TCP has ECN set */
> __u8 gso_type;
> __virtio16 hdr_len; /* Ethernet + IP + tcp/udp hdrs */
?? 2021/5/11 ????12:42, Yuri Benditovich д??:
> Large UDP packet provided by the guest with GSO type set to
> VIRTIO_NET_HDR_GSO_UDP_L4 will be divided to several UDP
> packets according to the gso_size field.
>
> Signed-off-by: Yuri Benditovich <[email protected]>
> ---
> include/linux/virtio_net.h | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/include/linux/virtio_net.h b/include/linux/virtio_net.h
> index b465f8f3e554..4ecf9a1ca912 100644
> --- a/include/linux/virtio_net.h
> +++ b/include/linux/virtio_net.h
> @@ -51,6 +51,11 @@ static inline int virtio_net_hdr_to_skb(struct sk_buff *skb,
> ip_proto = IPPROTO_UDP;
> thlen = sizeof(struct udphdr);
> break;
> + case VIRTIO_NET_HDR_GSO_UDP_L4:
> + gso_type = SKB_GSO_UDP_L4;
> + ip_proto = IPPROTO_UDP;
> + thlen = sizeof(struct udphdr);
> + break;
This is only for rx, how about tx?
Thanks
> default:
> return -EINVAL;
> }
?? 2021/5/11 ????12:42, Yuri Benditovich д??:
> Signed-off-by: Yuri Benditovich <[email protected]>
> ---
> drivers/net/tun.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index 84f832806313..a35054f9d941 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -2812,7 +2812,7 @@ static int set_offload(struct tun_struct *tun, unsigned long arg)
> arg &= ~(TUN_F_TSO4|TUN_F_TSO6);
> }
>
> - arg &= ~TUN_F_UFO;
> + arg &= ~(TUN_F_UFO|TUN_F_USO);
It looks to me kernel doesn't use "USO", so TUN_F_UDP_GSO_L4 is a better
name for this and I guess we should toggle NETIF_F_UDP_GSO_l4 here?
And how about macvtap?
Thanks
> }
>
> /* This gives the user a way to test for new features in future by
On Tue, May 11, 2021 at 9:47 AM Jason Wang <[email protected]> wrote:
>
>
> 在 2021/5/11 下午12:42, Yuri Benditovich 写道:
> > Define feature bit and GSO type according to the VIRTIO
> > specification.
> >
> > Signed-off-by: Yuri Benditovich <[email protected]>
> > ---
> > include/uapi/linux/virtio_net.h | 2 ++
> > 1 file changed, 2 insertions(+)
> >
> > diff --git a/include/uapi/linux/virtio_net.h b/include/uapi/linux/virtio_net.h
> > index 3f55a4215f11..a556ac735d7f 100644
> > --- a/include/uapi/linux/virtio_net.h
> > +++ b/include/uapi/linux/virtio_net.h
> > @@ -57,6 +57,7 @@
> > * Steering */
> > #define VIRTIO_NET_F_CTRL_MAC_ADDR 23 /* Set MAC address */
> >
> > +#define VIRTIO_NET_F_HOST_USO 56 /* Host can handle USO packets */
This is the virtio-net feature
> > #define VIRTIO_NET_F_HASH_REPORT 57 /* Supports hash report */
> > #define VIRTIO_NET_F_RSS 60 /* Supports RSS RX steering */
> > #define VIRTIO_NET_F_RSC_EXT 61 /* extended coalescing info */
> > @@ -130,6 +131,7 @@ struct virtio_net_hdr_v1 {
> > #define VIRTIO_NET_HDR_GSO_TCPV4 1 /* GSO frame, IPv4 TCP (TSO) */
> > #define VIRTIO_NET_HDR_GSO_UDP 3 /* GSO frame, IPv4 UDP (UFO) */
> > #define VIRTIO_NET_HDR_GSO_TCPV6 4 /* GSO frame, IPv6 TCP */
> > +#define VIRTIO_NET_HDR_GSO_UDP_L4 5 /* GSO frame, IPv4 UDP (USO) */
This is respective GSO type
>
>
> This is the gso_type not the feature actually.
>
> I wonder what's the reason for not
>
> 1) introducing a dedicated virtio-net feature bit for this
> (VIRTIO_NET_F_GUEST_GSO_UDP_L4.
This series is not for GUEST's feature, it is only for host feature.
> 2) toggle the NETIF_F_GSO_UDP_L4 feature for tuntap based on the
> negotiated feature.
The NETIF_F_GSO_UDP_L4 would be required for the guest RX path.
The guest TX path does not require any flags to be propagated, it only
allows the guest to transmit large UDP packets and have them
automatically splitted.
(This is similar to HOST_UFO but does packet segmentation instead of
fragmentation. GUEST_UFO indeed requires a respective NETIF flag, as
it is unclear whether the guest is capable of receiving such packets).
>
> Thanks
>
>
> > #define VIRTIO_NET_HDR_GSO_ECN 0x80 /* TCP has ECN set */
> > __u8 gso_type;
> > __virtio16 hdr_len; /* Ethernet + IP + tcp/udp hdrs */
>
On Tue, May 11, 2021 at 9:47 AM Jason Wang <[email protected]> wrote:
>
>
> 在 2021/5/11 下午12:42, Yuri Benditovich 写道:
> > Large UDP packet provided by the guest with GSO type set to
> > VIRTIO_NET_HDR_GSO_UDP_L4 will be divided to several UDP
> > packets according to the gso_size field.
> >
> > Signed-off-by: Yuri Benditovich <[email protected]>
> > ---
> > include/linux/virtio_net.h | 5 +++++
> > 1 file changed, 5 insertions(+)
> >
> > diff --git a/include/linux/virtio_net.h b/include/linux/virtio_net.h
> > index b465f8f3e554..4ecf9a1ca912 100644
> > --- a/include/linux/virtio_net.h
> > +++ b/include/linux/virtio_net.h
> > @@ -51,6 +51,11 @@ static inline int virtio_net_hdr_to_skb(struct sk_buff *skb,
> > ip_proto = IPPROTO_UDP;
> > thlen = sizeof(struct udphdr);
> > break;
> > + case VIRTIO_NET_HDR_GSO_UDP_L4:
> > + gso_type = SKB_GSO_UDP_L4;
> > + ip_proto = IPPROTO_UDP;
> > + thlen = sizeof(struct udphdr);
> > + break;
>
>
> This is only for rx, how about tx?
In terms of the guest this is only for TX.
Guest RX is a different thing, this is actually coalescing of
segmented UDP packets into a large one.
This feature is not defined in the virtio spec yet and the support of
it first of all depends on the OS.
For example: TCP LSO (guest TX) is supported almost by all the
versions of Windows.
TCP RSC (coalescing of TCP segments) is supported by Win 8 / Server 2012 and up.
UDP segmentation is supported by Windows kernels 1903+
UDP coalescing is defined by Windows kernels 2004+ and not supported
by the driver yet.
>
> Thanks
>
>
>
> > default:
> > return -EINVAL;
> > }
>
On Tue, May 11, 2021 at 4:12 PM Yuri Benditovich
<[email protected]> wrote:
>
> On Tue, May 11, 2021 at 9:47 AM Jason Wang <[email protected]> wrote:
> >
> >
> > 在 2021/5/11 下午12:42, Yuri Benditovich 写道:
> > > Define feature bit and GSO type according to the VIRTIO
> > > specification.
> > >
> > > Signed-off-by: Yuri Benditovich <[email protected]>
> > > ---
> > > include/uapi/linux/virtio_net.h | 2 ++
> > > 1 file changed, 2 insertions(+)
> > >
> > > diff --git a/include/uapi/linux/virtio_net.h b/include/uapi/linux/virtio_net.h
> > > index 3f55a4215f11..a556ac735d7f 100644
> > > --- a/include/uapi/linux/virtio_net.h
> > > +++ b/include/uapi/linux/virtio_net.h
> > > @@ -57,6 +57,7 @@
> > > * Steering */
> > > #define VIRTIO_NET_F_CTRL_MAC_ADDR 23 /* Set MAC address */
> > >
> > > +#define VIRTIO_NET_F_HOST_USO 56 /* Host can handle USO packets */
>
> This is the virtio-net feature
Right, I miss this part.
>
> > > #define VIRTIO_NET_F_HASH_REPORT 57 /* Supports hash report */
> > > #define VIRTIO_NET_F_RSS 60 /* Supports RSS RX steering */
> > > #define VIRTIO_NET_F_RSC_EXT 61 /* extended coalescing info */
> > > @@ -130,6 +131,7 @@ struct virtio_net_hdr_v1 {
> > > #define VIRTIO_NET_HDR_GSO_TCPV4 1 /* GSO frame, IPv4 TCP (TSO) */
> > > #define VIRTIO_NET_HDR_GSO_UDP 3 /* GSO frame, IPv4 UDP (UFO) */
> > > #define VIRTIO_NET_HDR_GSO_TCPV6 4 /* GSO frame, IPv6 TCP */
> > > +#define VIRTIO_NET_HDR_GSO_UDP_L4 5 /* GSO frame, IPv4 UDP (USO) */
>
> This is respective GSO type
>
> >
> >
> > This is the gso_type not the feature actually.
> >
> > I wonder what's the reason for not
> >
> > 1) introducing a dedicated virtio-net feature bit for this
> > (VIRTIO_NET_F_GUEST_GSO_UDP_L4.
>
> This series is not for GUEST's feature, it is only for host feature.
>
> > 2) toggle the NETIF_F_GSO_UDP_L4 feature for tuntap based on the
> > negotiated feature.
>
> The NETIF_F_GSO_UDP_L4 would be required for the guest RX path.
> The guest TX path does not require any flags to be propagated, it only
> allows the guest to transmit large UDP packets and have them
> automatically splitted.
> (This is similar to HOST_UFO but does packet segmentation instead of
> fragmentation. GUEST_UFO indeed requires a respective NETIF flag, as
> it is unclear whether the guest is capable of receiving such packets).
So I think it's better to implement TX/RX in the same series unless
there's something missed:
For Guest TX, NETIF_F_GSO_UDP_L4 needs to be enabled in the guest
virtio-net only when VIRTIO_NET_F_HOST_USO is negotiated.
For guest RX, NETIF_F_GSO_UDP_L4 needs to be enabled on the host
tuntap only when VIRTIO_NET_F_GUEST_USO is neogiated.
Thanks
>
> >
> > Thanks
> >
> >
> > > #define VIRTIO_NET_HDR_GSO_ECN 0x80 /* TCP has ECN set */
> > > __u8 gso_type;
> > > __virtio16 hdr_len; /* Ethernet + IP + tcp/udp hdrs */
> >
>
On Tue, May 11, 2021 at 4:24 PM Yuri Benditovich
<[email protected]> wrote:
>
> On Tue, May 11, 2021 at 9:47 AM Jason Wang <[email protected]> wrote:
> >
> >
> > 在 2021/5/11 下午12:42, Yuri Benditovich 写道:
> > > Large UDP packet provided by the guest with GSO type set to
> > > VIRTIO_NET_HDR_GSO_UDP_L4 will be divided to several UDP
> > > packets according to the gso_size field.
> > >
> > > Signed-off-by: Yuri Benditovich <[email protected]>
> > > ---
> > > include/linux/virtio_net.h | 5 +++++
> > > 1 file changed, 5 insertions(+)
> > >
> > > diff --git a/include/linux/virtio_net.h b/include/linux/virtio_net.h
> > > index b465f8f3e554..4ecf9a1ca912 100644
> > > --- a/include/linux/virtio_net.h
> > > +++ b/include/linux/virtio_net.h
> > > @@ -51,6 +51,11 @@ static inline int virtio_net_hdr_to_skb(struct sk_buff *skb,
> > > ip_proto = IPPROTO_UDP;
> > > thlen = sizeof(struct udphdr);
> > > break;
> > > + case VIRTIO_NET_HDR_GSO_UDP_L4:
> > > + gso_type = SKB_GSO_UDP_L4;
> > > + ip_proto = IPPROTO_UDP;
> > > + thlen = sizeof(struct udphdr);
> > > + break;
> >
> >
> > This is only for rx, how about tx?
>
> In terms of the guest this is only for TX.
So virtio_net_hdr_to_skb() can be called by all the followings:
1) receive_buf() which is guest RX.
2) tun_get_user() which is guest TX
3) tap_get_user() which is guest TX
4) {t}packet_send() which is userspace TX
So it touches for both RX and TX.
> Guest RX is a different thing, this is actually coalescing of
> segmented UDP packets into a large one.
Another case, the packet could be sent from another VM (like the UFO case).
Supporting that for both TX and RX and greatly improve the performance
of VM2VM traffic.
Thanks
> This feature is not defined in the virtio spec yet and the support of
> it first of all depends on the OS.
> For example: TCP LSO (guest TX) is supported almost by all the
> versions of Windows.
> TCP RSC (coalescing of TCP segments) is supported by Win 8 / Server 2012 and up.
> UDP segmentation is supported by Windows kernels 1903+
> UDP coalescing is defined by Windows kernels 2004+ and not supported
> by the driver yet.
>
> >
> > Thanks
> >
> >
> >
> > > default:
> > > return -EINVAL;
> > > }
> >
>
On Tue, May 11, 2021 at 9:50 AM Jason Wang <[email protected]> wrote:
>
>
> 在 2021/5/11 下午12:42, Yuri Benditovich 写道:
> > Signed-off-by: Yuri Benditovich <[email protected]>
> > ---
> > drivers/net/tun.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> > index 84f832806313..a35054f9d941 100644
> > --- a/drivers/net/tun.c
> > +++ b/drivers/net/tun.c
> > @@ -2812,7 +2812,7 @@ static int set_offload(struct tun_struct *tun, unsigned long arg)
> > arg &= ~(TUN_F_TSO4|TUN_F_TSO6);
> > }
> >
> > - arg &= ~TUN_F_UFO;
> > + arg &= ~(TUN_F_UFO|TUN_F_USO);
>
>
> It looks to me kernel doesn't use "USO", so TUN_F_UDP_GSO_L4 is a better
> name for this
No problem, I can change it in v2
and I guess we should toggle NETIF_F_UDP_GSO_l4 here?
No, we do not, because this indicates only the fact that the guest can
send large UDP packets and have them splitted to UDP segments.
>
> And how about macvtap?
We will check how to do that for macvtap. We will send a separate
patch for macvtap or ask for advice.
>
> Thanks
>
>
> > }
> >
> > /* This gives the user a way to test for new features in future by
>
On Tue, May 11, 2021 at 11:24 AM Jason Wang <[email protected]> wrote:
>
> On Tue, May 11, 2021 at 4:12 PM Yuri Benditovich
> <[email protected]> wrote:
> >
> > On Tue, May 11, 2021 at 9:47 AM Jason Wang <[email protected]> wrote:
> > >
> > >
> > > 在 2021/5/11 下午12:42, Yuri Benditovich 写道:
> > > > Define feature bit and GSO type according to the VIRTIO
> > > > specification.
> > > >
> > > > Signed-off-by: Yuri Benditovich <[email protected]>
> > > > ---
> > > > include/uapi/linux/virtio_net.h | 2 ++
> > > > 1 file changed, 2 insertions(+)
> > > >
> > > > diff --git a/include/uapi/linux/virtio_net.h b/include/uapi/linux/virtio_net.h
> > > > index 3f55a4215f11..a556ac735d7f 100644
> > > > --- a/include/uapi/linux/virtio_net.h
> > > > +++ b/include/uapi/linux/virtio_net.h
> > > > @@ -57,6 +57,7 @@
> > > > * Steering */
> > > > #define VIRTIO_NET_F_CTRL_MAC_ADDR 23 /* Set MAC address */
> > > >
> > > > +#define VIRTIO_NET_F_HOST_USO 56 /* Host can handle USO packets */
> >
> > This is the virtio-net feature
>
> Right, I miss this part.
>
> >
> > > > #define VIRTIO_NET_F_HASH_REPORT 57 /* Supports hash report */
> > > > #define VIRTIO_NET_F_RSS 60 /* Supports RSS RX steering */
> > > > #define VIRTIO_NET_F_RSC_EXT 61 /* extended coalescing info */
> > > > @@ -130,6 +131,7 @@ struct virtio_net_hdr_v1 {
> > > > #define VIRTIO_NET_HDR_GSO_TCPV4 1 /* GSO frame, IPv4 TCP (TSO) */
> > > > #define VIRTIO_NET_HDR_GSO_UDP 3 /* GSO frame, IPv4 UDP (UFO) */
> > > > #define VIRTIO_NET_HDR_GSO_TCPV6 4 /* GSO frame, IPv6 TCP */
> > > > +#define VIRTIO_NET_HDR_GSO_UDP_L4 5 /* GSO frame, IPv4 UDP (USO) */
> >
> > This is respective GSO type
> >
> > >
> > >
> > > This is the gso_type not the feature actually.
> > >
> > > I wonder what's the reason for not
> > >
> > > 1) introducing a dedicated virtio-net feature bit for this
> > > (VIRTIO_NET_F_GUEST_GSO_UDP_L4.
> >
> > This series is not for GUEST's feature, it is only for host feature.
> >
> > > 2) toggle the NETIF_F_GSO_UDP_L4 feature for tuntap based on the
> > > negotiated feature.
> >
> > The NETIF_F_GSO_UDP_L4 would be required for the guest RX path.
> > The guest TX path does not require any flags to be propagated, it only
> > allows the guest to transmit large UDP packets and have them
> > automatically splitted.
> > (This is similar to HOST_UFO but does packet segmentation instead of
> > fragmentation. GUEST_UFO indeed requires a respective NETIF flag, as
> > it is unclear whether the guest is capable of receiving such packets).
>
> So I think it's better to implement TX/RX in the same series unless
> there's something missed:
>
> For Guest TX, NETIF_F_GSO_UDP_L4 needs to be enabled in the guest
> virtio-net only when VIRTIO_NET_F_HOST_USO is negotiated.
I understand that this is what should be done when this feature will
be added to Linux virtio-net driver.
But at the moment we do not have enough resources to work on it.
Currently we have a clear use case and ability to test in on Windows guest.
Respective QEMU changes are pending for kernel patches, current
reference is https://github.com/daynix/qemu/tree/uso
> For guest RX, NETIF_F_GSO_UDP_L4 needs to be enabled on the host
> tuntap only when VIRTIO_NET_F_GUEST_USO is neogiated.
Currently we are not able to use guest RX UDP GSO.
In order to do that we at least should be able to build our Windows
drivers with the most updated driver development kit (2004+).
At the moment we can't, this task is in a plan but can take several
months. So we do not have a test/use case with Windows VM.
>
> Thanks
>
> >
> > >
> > > Thanks
> > >
> > >
> > > > #define VIRTIO_NET_HDR_GSO_ECN 0x80 /* TCP has ECN set */
> > > > __u8 gso_type;
> > > > __virtio16 hdr_len; /* Ethernet + IP + tcp/udp hdrs */
> > >
> >
>
On Tue, May 11, 2021 at 12:43 AM Yuri Benditovich
<[email protected]> wrote:
>
> Large UDP packet provided by the guest with GSO type set to
> VIRTIO_NET_HDR_GSO_UDP_L4 will be divided to several UDP
> packets according to the gso_size field.
>
> Signed-off-by: Yuri Benditovich <[email protected]>
> ---
> include/linux/virtio_net.h | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/include/linux/virtio_net.h b/include/linux/virtio_net.h
> index b465f8f3e554..4ecf9a1ca912 100644
> --- a/include/linux/virtio_net.h
> +++ b/include/linux/virtio_net.h
> @@ -51,6 +51,11 @@ static inline int virtio_net_hdr_to_skb(struct sk_buff *skb,
> ip_proto = IPPROTO_UDP;
> thlen = sizeof(struct udphdr);
> break;
> + case VIRTIO_NET_HDR_GSO_UDP_L4:
> + gso_type = SKB_GSO_UDP_L4;
> + ip_proto = IPPROTO_UDP;
> + thlen = sizeof(struct udphdr);
> + break;
If adding a new VIRTIO_NET_HDR type I suggest adding separate IPv4 and
IPv6 variants, analogous to VIRTIO_NET_HDR_GSO_TCPV[46]. To avoid
having to infer protocol again, as for UDP fragmentation offload (the
retry case below this code).
On Tue, May 11, 2021 at 11:33 AM Yuri Benditovich
<[email protected]> wrote:
>
> On Tue, May 11, 2021 at 9:50 AM Jason Wang <[email protected]> wrote:
> >
> >
> > 在 2021/5/11 下午12:42, Yuri Benditovich 写道:
> > > Signed-off-by: Yuri Benditovich <[email protected]>
> > > ---
> > > drivers/net/tun.c | 2 +-
> > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> > > index 84f832806313..a35054f9d941 100644
> > > --- a/drivers/net/tun.c
> > > +++ b/drivers/net/tun.c
> > > @@ -2812,7 +2812,7 @@ static int set_offload(struct tun_struct *tun, unsigned long arg)
> > > arg &= ~(TUN_F_TSO4|TUN_F_TSO6);
> > > }
> > >
> > > - arg &= ~TUN_F_UFO;
> > > + arg &= ~(TUN_F_UFO|TUN_F_USO);
> >
> >
> > It looks to me kernel doesn't use "USO", so TUN_F_UDP_GSO_L4 is a better
> > name for this
>
> No problem, I can change it in v2
>
> and I guess we should toggle NETIF_F_UDP_GSO_l4 here?
>
> No, we do not, because this indicates only the fact that the guest can
> send large UDP packets and have them splitted to UDP segments.
>
> >
> > And how about macvtap?
>
> We will check how to do that for macvtap. We will send a separate
> patch for macvtap or ask for advice.
>
I'll add this feature to the tap.c also (AFAIU this will enable the
USO for macvtap).
Please correct me if I'm mistaken.
> >
> > Thanks
> >
> >
> > > }
> > >
> > > /* This gives the user a way to test for new features in future by
> >
在 2021/5/11 下午5:21, Yuri Benditovich 写道:
> On Tue, May 11, 2021 at 11:24 AM Jason Wang <[email protected]> wrote:
>> On Tue, May 11, 2021 at 4:12 PM Yuri Benditovich
>> <[email protected]> wrote:
>>> On Tue, May 11, 2021 at 9:47 AM Jason Wang <[email protected]> wrote:
>>>>
>>>> 在 2021/5/11 下午12:42, Yuri Benditovich 写道:
>>>>> Define feature bit and GSO type according to the VIRTIO
>>>>> specification.
>>>>>
>>>>> Signed-off-by: Yuri Benditovich <[email protected]>
>>>>> ---
>>>>> include/uapi/linux/virtio_net.h | 2 ++
>>>>> 1 file changed, 2 insertions(+)
>>>>>
>>>>> diff --git a/include/uapi/linux/virtio_net.h b/include/uapi/linux/virtio_net.h
>>>>> index 3f55a4215f11..a556ac735d7f 100644
>>>>> --- a/include/uapi/linux/virtio_net.h
>>>>> +++ b/include/uapi/linux/virtio_net.h
>>>>> @@ -57,6 +57,7 @@
>>>>> * Steering */
>>>>> #define VIRTIO_NET_F_CTRL_MAC_ADDR 23 /* Set MAC address */
>>>>>
>>>>> +#define VIRTIO_NET_F_HOST_USO 56 /* Host can handle USO packets */
>>> This is the virtio-net feature
>> Right, I miss this part.
>>
>>>>> #define VIRTIO_NET_F_HASH_REPORT 57 /* Supports hash report */
>>>>> #define VIRTIO_NET_F_RSS 60 /* Supports RSS RX steering */
>>>>> #define VIRTIO_NET_F_RSC_EXT 61 /* extended coalescing info */
>>>>> @@ -130,6 +131,7 @@ struct virtio_net_hdr_v1 {
>>>>> #define VIRTIO_NET_HDR_GSO_TCPV4 1 /* GSO frame, IPv4 TCP (TSO) */
>>>>> #define VIRTIO_NET_HDR_GSO_UDP 3 /* GSO frame, IPv4 UDP (UFO) */
>>>>> #define VIRTIO_NET_HDR_GSO_TCPV6 4 /* GSO frame, IPv6 TCP */
>>>>> +#define VIRTIO_NET_HDR_GSO_UDP_L4 5 /* GSO frame, IPv4 UDP (USO) */
>>> This is respective GSO type
>>>
>>>>
>>>> This is the gso_type not the feature actually.
>>>>
>>>> I wonder what's the reason for not
>>>>
>>>> 1) introducing a dedicated virtio-net feature bit for this
>>>> (VIRTIO_NET_F_GUEST_GSO_UDP_L4.
>>> This series is not for GUEST's feature, it is only for host feature.
>>>
>>>> 2) toggle the NETIF_F_GSO_UDP_L4 feature for tuntap based on the
>>>> negotiated feature.
>>> The NETIF_F_GSO_UDP_L4 would be required for the guest RX path.
>>> The guest TX path does not require any flags to be propagated, it only
>>> allows the guest to transmit large UDP packets and have them
>>> automatically splitted.
>>> (This is similar to HOST_UFO but does packet segmentation instead of
>>> fragmentation. GUEST_UFO indeed requires a respective NETIF flag, as
>>> it is unclear whether the guest is capable of receiving such packets).
>> So I think it's better to implement TX/RX in the same series unless
>> there's something missed:
>>
>> For Guest TX, NETIF_F_GSO_UDP_L4 needs to be enabled in the guest
>> virtio-net only when VIRTIO_NET_F_HOST_USO is negotiated.
> I understand that this is what should be done when this feature will
> be added to Linux virtio-net driver.
> But at the moment we do not have enough resources to work on it.
> Currently we have a clear use case and ability to test in on Windows guest.
> Respective QEMU changes are pending for kernel patches, current
> reference is https://github.com/daynix/qemu/tree/uso
This looks fine but as replied in another thread.
We can test both TX and RX with Linux guests simply:
We can just use 2 VMs, and let one VM send GSO_UDP_L4 packet to another,
then both tx and rx in both guest (virtio-net) and host (virtio-net) are
tested?
Thanks
>
>> For guest RX, NETIF_F_GSO_UDP_L4 needs to be enabled on the host
>> tuntap only when VIRTIO_NET_F_GUEST_USO is neogiated.
> Currently we are not able to use guest RX UDP GSO.
> In order to do that we at least should be able to build our Windows
> drivers with the most updated driver development kit (2004+).
> At the moment we can't, this task is in a plan but can take several
> months. So we do not have a test/use case with Windows VM.
>
>
>> Thanks
>>
>>>> Thanks
>>>>
>>>>
>>>>> #define VIRTIO_NET_HDR_GSO_ECN 0x80 /* TCP has ECN set */
>>>>> __u8 gso_type;
>>>>> __virtio16 hdr_len; /* Ethernet + IP + tcp/udp hdrs */
在 2021/5/11 下午4:33, Yuri Benditovich 写道:
> On Tue, May 11, 2021 at 9:50 AM Jason Wang <[email protected]> wrote:
>>
>> 在 2021/5/11 下午12:42, Yuri Benditovich 写道:
>>> Signed-off-by: Yuri Benditovich <[email protected]>
>>> ---
>>> drivers/net/tun.c | 2 +-
>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
>>> index 84f832806313..a35054f9d941 100644
>>> --- a/drivers/net/tun.c
>>> +++ b/drivers/net/tun.c
>>> @@ -2812,7 +2812,7 @@ static int set_offload(struct tun_struct *tun, unsigned long arg)
>>> arg &= ~(TUN_F_TSO4|TUN_F_TSO6);
>>> }
>>>
>>> - arg &= ~TUN_F_UFO;
>>> + arg &= ~(TUN_F_UFO|TUN_F_USO);
>>
>> It looks to me kernel doesn't use "USO", so TUN_F_UDP_GSO_L4 is a better
>> name for this
> No problem, I can change it in v2
>
> and I guess we should toggle NETIF_F_UDP_GSO_l4 here?
>
> No, we do not, because this indicates only the fact that the guest can
> send large UDP packets and have them splitted to UDP segments.
Actually the reverse. The set_offload() controls the tuntap TX path
(guest RX path).
When VIRTIO_NET_F_GUEST_XXX was not negotiated, the corresponding netdev
features needs to be disabled. When host tries to send those packets to
guest, it needs to do software segmentation.
See virtio_net_apply_guest_offloads().
There's currently no way (or not need) to prevent tuntap from receiving
GSO packets.
Thanks
>
>> And how about macvtap?
> We will check how to do that for macvtap. We will send a separate
> patch for macvtap or ask for advice.
>
>> Thanks
>>
>>
>>> }
>>>
>>> /* This gives the user a way to test for new features in future by
On Wed, May 12, 2021 at 4:33 AM Jason Wang <[email protected]> wrote:
>
>
> 在 2021/5/11 下午4:33, Yuri Benditovich 写道:
> > On Tue, May 11, 2021 at 9:50 AM Jason Wang <[email protected]> wrote:
> >>
> >> 在 2021/5/11 下午12:42, Yuri Benditovich 写道:
> >>> Signed-off-by: Yuri Benditovich <[email protected]>
> >>> ---
> >>> drivers/net/tun.c | 2 +-
> >>> 1 file changed, 1 insertion(+), 1 deletion(-)
> >>>
> >>> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> >>> index 84f832806313..a35054f9d941 100644
> >>> --- a/drivers/net/tun.c
> >>> +++ b/drivers/net/tun.c
> >>> @@ -2812,7 +2812,7 @@ static int set_offload(struct tun_struct *tun, unsigned long arg)
> >>> arg &= ~(TUN_F_TSO4|TUN_F_TSO6);
> >>> }
> >>>
> >>> - arg &= ~TUN_F_UFO;
> >>> + arg &= ~(TUN_F_UFO|TUN_F_USO);
> >>
> >> It looks to me kernel doesn't use "USO", so TUN_F_UDP_GSO_L4 is a better
> >> name for this
> > No problem, I can change it in v2
> >
> > and I guess we should toggle NETIF_F_UDP_GSO_l4 here?
> >
> > No, we do not, because this indicates only the fact that the guest can
> > send large UDP packets and have them splitted to UDP segments.
>
>
> Actually the reverse. The set_offload() controls the tuntap TX path
> (guest RX path).
The set_offloads does 2 things:
1. At the initialization time qemu probes set_offload(something) to
check which features are supported by TAP/TUN.
2. Later it configures the guest RX path according to guest's needs/capabilities
Typical initialization sequence is (in case the QEMU supports USO feature):
TAP/TUN set offload 11 (probe for UFO support)
TAP/TUN set offload 21 (probe for USO support)
TAP/TUN set offload 0
...
TAP/TUN set offload 7 (configuration of offloads according to GUEST features)
This series of patches is for VIRTIO_NET_F_HOST_USO only, virtio-net
features like VIRTIO_NET_F_GUEST_USO_(4/6/whatever) are not defined in
the spec yet.
>
> When VIRTIO_NET_F_GUEST_XXX was not negotiated, the corresponding netdev
> features needs to be disabled. When host tries to send those packets to
> guest, it needs to do software segmentation.
>
> See virtio_net_apply_guest_offloads().
>
> There's currently no way (or not need) to prevent tuntap from receiving
> GSO packets.
>
> Thanks
>
>
> >
> >> And how about macvtap?
> > We will check how to do that for macvtap. We will send a separate
> > patch for macvtap or ask for advice.
> >
> >> Thanks
> >>
> >>
> >>> }
> >>>
> >>> /* This gives the user a way to test for new features in future by
>
On Tue, May 11, 2021 at 8:48 PM Willem de Bruijn
<[email protected]> wrote:
>
> On Tue, May 11, 2021 at 12:43 AM Yuri Benditovich
> <[email protected]> wrote:
> >
> > Large UDP packet provided by the guest with GSO type set to
> > VIRTIO_NET_HDR_GSO_UDP_L4 will be divided to several UDP
> > packets according to the gso_size field.
> >
> > Signed-off-by: Yuri Benditovich <[email protected]>
> > ---
> > include/linux/virtio_net.h | 5 +++++
> > 1 file changed, 5 insertions(+)
> >
> > diff --git a/include/linux/virtio_net.h b/include/linux/virtio_net.h
> > index b465f8f3e554..4ecf9a1ca912 100644
> > --- a/include/linux/virtio_net.h
> > +++ b/include/linux/virtio_net.h
> > @@ -51,6 +51,11 @@ static inline int virtio_net_hdr_to_skb(struct sk_buff *skb,
> > ip_proto = IPPROTO_UDP;
> > thlen = sizeof(struct udphdr);
> > break;
> > + case VIRTIO_NET_HDR_GSO_UDP_L4:
> > + gso_type = SKB_GSO_UDP_L4;
> > + ip_proto = IPPROTO_UDP;
> > + thlen = sizeof(struct udphdr);
> > + break;
>
> If adding a new VIRTIO_NET_HDR type I suggest adding separate IPv4 and
> IPv6 variants, analogous to VIRTIO_NET_HDR_GSO_TCPV[46]. To avoid
> having to infer protocol again, as for UDP fragmentation offload (the
> retry case below this code).
Thank you for denoting this important point of distinguishing between v4 and v6.
Let's try to take a deeper look to see what is the correct thing to do
and please correct me if I'm wrong:
1. For USO we do not need to guess the protocol as it is used with
VIRTIO_NET_HDR_F_NEEDS_CSUM (unlike UFO) and the USO packets
transmitted by the guest are under the same clause as both
VIRTIO_NET_HDR_GSO_TCP, i.e. under if (hdr->flags &
VIRTIO_NET_HDR_F_NEEDS_CSUM) {
2. If we even define VIRTIO_NET_HDR_GSO_UDPv4_L4 and
VIRTIO_NET_HDR_GSO_UDPv6_L4 - both will be translated to
SKB_GSO_UDP_L4, so this information is immediately lost (the code will
look like:
case VIRTIO_NET_HDR_GSO_UDP4_L4: case VIRTIO_NET_HDR_GSO_UDP6_L4
gso_type = SKB_GSO_UDP;
3. When we will define the respective guest features (like
VIRTIO_NET_F_HOST_USO4 VIRTIO_NET_F_HOST_USO6) we will need to
recreate the virtio_net header from the skb when both v4 and v6 have
the same SKB_GSO_UDP_L4, (see virtio_net_hdr_from_skb) and I'm not
sure whether somebody needs the exact v4 or v6 information on guest RX
path.
4. What is completely correct is that when we will start working with
the guest RX path we will need to define something like NETIF_F_USO4
and NETIF_F_USO6 and configure them according to exact guest offload
capabilities.
Do you agree?
On Wed, May 12, 2021 at 2:10 AM Yuri Benditovich
<[email protected]> wrote:
>
> On Tue, May 11, 2021 at 8:48 PM Willem de Bruijn
> <[email protected]> wrote:
> >
> > On Tue, May 11, 2021 at 12:43 AM Yuri Benditovich
> > <[email protected]> wrote:
> > >
> > > Large UDP packet provided by the guest with GSO type set to
> > > VIRTIO_NET_HDR_GSO_UDP_L4 will be divided to several UDP
> > > packets according to the gso_size field.
> > >
> > > Signed-off-by: Yuri Benditovich <[email protected]>
> > > ---
> > > include/linux/virtio_net.h | 5 +++++
> > > 1 file changed, 5 insertions(+)
> > >
> > > diff --git a/include/linux/virtio_net.h b/include/linux/virtio_net.h
> > > index b465f8f3e554..4ecf9a1ca912 100644
> > > --- a/include/linux/virtio_net.h
> > > +++ b/include/linux/virtio_net.h
> > > @@ -51,6 +51,11 @@ static inline int virtio_net_hdr_to_skb(struct sk_buff *skb,
> > > ip_proto = IPPROTO_UDP;
> > > thlen = sizeof(struct udphdr);
> > > break;
> > > + case VIRTIO_NET_HDR_GSO_UDP_L4:
> > > + gso_type = SKB_GSO_UDP_L4;
> > > + ip_proto = IPPROTO_UDP;
> > > + thlen = sizeof(struct udphdr);
> > > + break;
> >
> > If adding a new VIRTIO_NET_HDR type I suggest adding separate IPv4 and
> > IPv6 variants, analogous to VIRTIO_NET_HDR_GSO_TCPV[46]. To avoid
> > having to infer protocol again, as for UDP fragmentation offload (the
> > retry case below this code).
>
> Thank you for denoting this important point of distinguishing between v4 and v6.
> Let's try to take a deeper look to see what is the correct thing to do
> and please correct me if I'm wrong:
> 1. For USO we do not need to guess the protocol as it is used with
> VIRTIO_NET_HDR_F_NEEDS_CSUM (unlike UFO)
Enforcing that is a good start. We should also enforce that
skb->protocol is initialized to one of htons(ETH_P_IP) or
htons(ETH_P_IPV6), so that it does not have to be inferred by parsing.
These requirements were not enforced for previous values, and cannot
be introduced afterwards, which has led to have to add that extra code
to handle these obscure edge cases.
I agree that with well behaved configurations, the need for separate
_V4 and _V6 variants is not needed.
> and the USO packets
> transmitted by the guest are under the same clause as both
> VIRTIO_NET_HDR_GSO_TCP, i.e. under if (hdr->flags &
> VIRTIO_NET_HDR_F_NEEDS_CSUM) {
> 2. If we even define VIRTIO_NET_HDR_GSO_UDPv4_L4 and
> VIRTIO_NET_HDR_GSO_UDPv6_L4 - both will be translated to
> SKB_GSO_UDP_L4, so this information is immediately lost (the code will
> look like:
> case VIRTIO_NET_HDR_GSO_UDP4_L4: case VIRTIO_NET_HDR_GSO_UDP6_L4
> gso_type = SKB_GSO_UDP;
>
> 3. When we will define the respective guest features (like
> VIRTIO_NET_F_HOST_USO4 VIRTIO_NET_F_HOST_USO6) we will need to
> recreate the virtio_net header from the skb when both v4 and v6 have
> the same SKB_GSO_UDP_L4, (see virtio_net_hdr_from_skb) and I'm not
> sure whether somebody needs the exact v4 or v6 information on guest RX
> path.
FWIW, it is good to keep in mind that virtio_net_hdr is also used
outside virtio, in both ingress and egress paths.
> 4. What is completely correct is that when we will start working with
> the guest RX path we will need to define something like NETIF_F_USO4
> and NETIF_F_USO6 and configure them according to exact guest offload
> capabilities.
> Do you agree?
I don't immediately see the need for advertising this device feature
on a per-protocol basis. Can you elaborate?
On Wed, May 12, 2021 at 2:56 PM Yuri Benditovich
<[email protected]> wrote:
>
> On Wed, May 12, 2021 at 5:33 PM Willem de Bruijn
> <[email protected]> wrote:
> >
> > On Wed, May 12, 2021 at 2:10 AM Yuri Benditovich
> > <[email protected]> wrote:
> > >
> > > On Tue, May 11, 2021 at 8:48 PM Willem de Bruijn
> > > <[email protected]> wrote:
> > > >
> > > > On Tue, May 11, 2021 at 12:43 AM Yuri Benditovich
> > > > <[email protected]> wrote:
> > > > >
> > > > > Large UDP packet provided by the guest with GSO type set to
> > > > > VIRTIO_NET_HDR_GSO_UDP_L4 will be divided to several UDP
> > > > > packets according to the gso_size field.
> > > > >
> > > > > Signed-off-by: Yuri Benditovich <[email protected]>
> > > > > ---
> > > > > include/linux/virtio_net.h | 5 +++++
> > > > > 1 file changed, 5 insertions(+)
> > > > >
> > > > > diff --git a/include/linux/virtio_net.h b/include/linux/virtio_net.h
> > > > > index b465f8f3e554..4ecf9a1ca912 100644
> > > > > --- a/include/linux/virtio_net.h
> > > > > +++ b/include/linux/virtio_net.h
> > > > > @@ -51,6 +51,11 @@ static inline int virtio_net_hdr_to_skb(struct sk_buff *skb,
> > > > > ip_proto = IPPROTO_UDP;
> > > > > thlen = sizeof(struct udphdr);
> > > > > break;
> > > > > + case VIRTIO_NET_HDR_GSO_UDP_L4:
> > > > > + gso_type = SKB_GSO_UDP_L4;
> > > > > + ip_proto = IPPROTO_UDP;
> > > > > + thlen = sizeof(struct udphdr);
> > > > > + break;
> > > >
> > > > If adding a new VIRTIO_NET_HDR type I suggest adding separate IPv4 and
> > > > IPv6 variants, analogous to VIRTIO_NET_HDR_GSO_TCPV[46]. To avoid
> > > > having to infer protocol again, as for UDP fragmentation offload (the
> > > > retry case below this code).
> > >
> > > Thank you for denoting this important point of distinguishing between v4 and v6.
> > > Let's try to take a deeper look to see what is the correct thing to do
> > > and please correct me if I'm wrong:
> > > 1. For USO we do not need to guess the protocol as it is used with
> > > VIRTIO_NET_HDR_F_NEEDS_CSUM (unlike UFO)
> >
> > Enforcing that is a good start. We should also enforce that
> > skb->protocol is initialized to one of htons(ETH_P_IP) or
> > htons(ETH_P_IPV6), so that it does not have to be inferred by parsing.
>
> As this feature is new and is not used in any public release of any
> misbehaving driver, probably it is enough to state in the spec that
> VIRTIO_NET_HDR_F_NEEDS_CSUM is required for USO packets.
> The spec states that the USO feature requires checksumming feature.
The spec is not sufficient. These rules need to be enforced in the
kernel code, too.
> >
> > These requirements were not enforced for previous values, and cannot
> > be introduced afterwards, which has led to have to add that extra code
> > to handle these obscure edge cases.
> >
> > I agree that with well behaved configurations, the need for separate
> > _V4 and _V6 variants is not needed.
> >
> > > and the USO packets
> > > transmitted by the guest are under the same clause as both
> > > VIRTIO_NET_HDR_GSO_TCP, i.e. under if (hdr->flags &
> > > VIRTIO_NET_HDR_F_NEEDS_CSUM) {
> > > 2. If we even define VIRTIO_NET_HDR_GSO_UDPv4_L4 and
> > > VIRTIO_NET_HDR_GSO_UDPv6_L4 - both will be translated to
> > > SKB_GSO_UDP_L4, so this information is immediately lost (the code will
> > > look like:
> > > case VIRTIO_NET_HDR_GSO_UDP4_L4: case VIRTIO_NET_HDR_GSO_UDP6_L4
> > > gso_type = SKB_GSO_UDP;
> > >
> > > 3. When we will define the respective guest features (like
> > > VIRTIO_NET_F_HOST_USO4 VIRTIO_NET_F_HOST_USO6) we will need to
> This is my typo: VIRTIO_NET_F_GUEST_USO4...
> > > recreate the virtio_net header from the skb when both v4 and v6 have
> > > the same SKB_GSO_UDP_L4, (see virtio_net_hdr_from_skb) and I'm not
> > > sure whether somebody needs the exact v4 or v6 information on guest RX
> > > path.
> >
> > FWIW, it is good to keep in mind that virtio_net_hdr is also used
> > outside virtio, in both ingress and egress paths.
>
> Can you please elaborate in which scenarios we do not have any virtio
> device in path but need virtio_net_hdr?
Packet sockets, tuntap.
> > > 4. What is completely correct is that when we will start working with
> > > the guest RX path we will need to define something like NETIF_F_USO4
> > > and NETIF_F_USO6 and configure them according to exact guest offload
> > > capabilities.
> > > Do you agree?
> >
> > I don't immediately see the need for advertising this device feature
> > on a per-protocol basis. Can you elaborate?
>
> Separate offload setting (controlled by the guest) for v4 and v6 in
> guest RX path is mandatory, at least Windows always requires this for
> any offload.
> In this case it seems easy to have also virtio-net device features to
> be indicated separately (the TAP/TUN should report its capabilities).
Ah, ok.
On Wed, May 12, 2021 at 5:33 PM Willem de Bruijn
<[email protected]> wrote:
>
> On Wed, May 12, 2021 at 2:10 AM Yuri Benditovich
> <[email protected]> wrote:
> >
> > On Tue, May 11, 2021 at 8:48 PM Willem de Bruijn
> > <[email protected]> wrote:
> > >
> > > On Tue, May 11, 2021 at 12:43 AM Yuri Benditovich
> > > <[email protected]> wrote:
> > > >
> > > > Large UDP packet provided by the guest with GSO type set to
> > > > VIRTIO_NET_HDR_GSO_UDP_L4 will be divided to several UDP
> > > > packets according to the gso_size field.
> > > >
> > > > Signed-off-by: Yuri Benditovich <[email protected]>
> > > > ---
> > > > include/linux/virtio_net.h | 5 +++++
> > > > 1 file changed, 5 insertions(+)
> > > >
> > > > diff --git a/include/linux/virtio_net.h b/include/linux/virtio_net.h
> > > > index b465f8f3e554..4ecf9a1ca912 100644
> > > > --- a/include/linux/virtio_net.h
> > > > +++ b/include/linux/virtio_net.h
> > > > @@ -51,6 +51,11 @@ static inline int virtio_net_hdr_to_skb(struct sk_buff *skb,
> > > > ip_proto = IPPROTO_UDP;
> > > > thlen = sizeof(struct udphdr);
> > > > break;
> > > > + case VIRTIO_NET_HDR_GSO_UDP_L4:
> > > > + gso_type = SKB_GSO_UDP_L4;
> > > > + ip_proto = IPPROTO_UDP;
> > > > + thlen = sizeof(struct udphdr);
> > > > + break;
> > >
> > > If adding a new VIRTIO_NET_HDR type I suggest adding separate IPv4 and
> > > IPv6 variants, analogous to VIRTIO_NET_HDR_GSO_TCPV[46]. To avoid
> > > having to infer protocol again, as for UDP fragmentation offload (the
> > > retry case below this code).
> >
> > Thank you for denoting this important point of distinguishing between v4 and v6.
> > Let's try to take a deeper look to see what is the correct thing to do
> > and please correct me if I'm wrong:
> > 1. For USO we do not need to guess the protocol as it is used with
> > VIRTIO_NET_HDR_F_NEEDS_CSUM (unlike UFO)
>
> Enforcing that is a good start. We should also enforce that
> skb->protocol is initialized to one of htons(ETH_P_IP) or
> htons(ETH_P_IPV6), so that it does not have to be inferred by parsing.
As this feature is new and is not used in any public release of any
misbehaving driver, probably it is enough to state in the spec that
VIRTIO_NET_HDR_F_NEEDS_CSUM is required for USO packets.
The spec states that the USO feature requires checksumming feature.
>
> These requirements were not enforced for previous values, and cannot
> be introduced afterwards, which has led to have to add that extra code
> to handle these obscure edge cases.
>
> I agree that with well behaved configurations, the need for separate
> _V4 and _V6 variants is not needed.
>
> > and the USO packets
> > transmitted by the guest are under the same clause as both
> > VIRTIO_NET_HDR_GSO_TCP, i.e. under if (hdr->flags &
> > VIRTIO_NET_HDR_F_NEEDS_CSUM) {
> > 2. If we even define VIRTIO_NET_HDR_GSO_UDPv4_L4 and
> > VIRTIO_NET_HDR_GSO_UDPv6_L4 - both will be translated to
> > SKB_GSO_UDP_L4, so this information is immediately lost (the code will
> > look like:
> > case VIRTIO_NET_HDR_GSO_UDP4_L4: case VIRTIO_NET_HDR_GSO_UDP6_L4
> > gso_type = SKB_GSO_UDP;
> >
> > 3. When we will define the respective guest features (like
> > VIRTIO_NET_F_HOST_USO4 VIRTIO_NET_F_HOST_USO6) we will need to
This is my typo: VIRTIO_NET_F_GUEST_USO4...
> > recreate the virtio_net header from the skb when both v4 and v6 have
> > the same SKB_GSO_UDP_L4, (see virtio_net_hdr_from_skb) and I'm not
> > sure whether somebody needs the exact v4 or v6 information on guest RX
> > path.
>
> FWIW, it is good to keep in mind that virtio_net_hdr is also used
> outside virtio, in both ingress and egress paths.
Can you please elaborate in which scenarios we do not have any virtio
device in path but need virtio_net_hdr?
>
> > 4. What is completely correct is that when we will start working with
> > the guest RX path we will need to define something like NETIF_F_USO4
> > and NETIF_F_USO6 and configure them according to exact guest offload
> > capabilities.
> > Do you agree?
>
> I don't immediately see the need for advertising this device feature
> on a per-protocol basis. Can you elaborate?
Separate offload setting (controlled by the guest) for v4 and v6 in
guest RX path is mandatory, at least Windows always requires this for
any offload.
In this case it seems easy to have also virtio-net device features to
be indicated separately (the TAP/TUN should report its capabilities).
On Thu, May 13, 2021 at 12:36 PM Yuri Benditovich
<[email protected]> wrote:
>
> On Thu, May 13, 2021 at 5:07 AM Jason Wang <[email protected]> wrote:
> >
> > On Wed, May 12, 2021 at 6:37 PM Yuri Benditovich
> > <[email protected]> wrote:
> > >
> > > On Wed, May 12, 2021 at 12:10 PM Jason Wang <[email protected]> wrote:
> > > >
> > > > On Wed, May 12, 2021 at 4:32 PM Yuri Benditovich
> > > > <[email protected]> wrote:
> > > > >
> > > > > On Wed, May 12, 2021 at 10:50 AM Jason Wang <[email protected]> wrote:
> > > > > >
> > > > > > On Wed, May 12, 2021 at 1:24 PM Yuri Benditovich
> > > > > > <[email protected]> wrote:
> > > > > > >
> > > > > > > On Wed, May 12, 2021 at 4:33 AM Jason Wang <[email protected]> wrote:
> > > > > > > >
> > > > > > > >
> > > > > > > > 在 2021/5/11 下午4:33, Yuri Benditovich 写道:
> > > > > > > > > On Tue, May 11, 2021 at 9:50 AM Jason Wang <[email protected]> wrote:
> > > > > > > > >>
> > > > > > > > >> 在 2021/5/11 下午12:42, Yuri Benditovich 写道:
> > > > > > > > >>> Signed-off-by: Yuri Benditovich <[email protected]>
> > > > > > > > >>> ---
> > > > > > > > >>> drivers/net/tun.c | 2 +-
> > > > > > > > >>> 1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > > > > >>>
> > > > > > > > >>> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> > > > > > > > >>> index 84f832806313..a35054f9d941 100644
> > > > > > > > >>> --- a/drivers/net/tun.c
> > > > > > > > >>> +++ b/drivers/net/tun.c
> > > > > > > > >>> @@ -2812,7 +2812,7 @@ static int set_offload(struct tun_struct *tun, unsigned long arg)
> > > > > > > > >>> arg &= ~(TUN_F_TSO4|TUN_F_TSO6);
> > > > > > > > >>> }
> > > > > > > > >>>
> > > > > > > > >>> - arg &= ~TUN_F_UFO;
> > > > > > > > >>> + arg &= ~(TUN_F_UFO|TUN_F_USO);
> > > > > > > > >>
> > > > > > > > >> It looks to me kernel doesn't use "USO", so TUN_F_UDP_GSO_L4 is a better
> > > > > > > > >> name for this
> > > > > > > > > No problem, I can change it in v2
> > > > > > > > >
> > > > > > > > > and I guess we should toggle NETIF_F_UDP_GSO_l4 here?
> > > > > > > > >
> > > > > > > > > No, we do not, because this indicates only the fact that the guest can
> > > > > > > > > send large UDP packets and have them splitted to UDP segments.
> > > > > > > >
> > > > > > > >
> > > > > > > > Actually the reverse. The set_offload() controls the tuntap TX path
> > > > > > > > (guest RX path).
> > > > > > >
> > > > > > > The set_offloads does 2 things:
> > > > > > > 1. At the initialization time qemu probes set_offload(something) to
> > > > > > > check which features are supported by TAP/TUN.
> > > > > >
> > > > > > Note that the probing is used for guest RX features not host RX.
> > > > >
> > > > > It looks like the hidden assumption (till now) is that if some feature
> > > > > is present - it exists simultaneously for host and guest.
> > > > > See QEMU get_features: if the TAP/TUN does not have UFO both HOST and
> > > > > GUEST FEATURES are cleared.
> > > >
> > > > Kind of, actually the assumption is: if a guest feature
> > > > (VIRTIO_NET_F_GUEST_XXX) is support, the corresponding host feature
> > > > (VIRTIO_NET_F_HOST_XXX) is also supported.
> > >
> > > So nothing tells us that the TUNSETOFFLOAD is going to set GUEST offloads.
> > > From if_tun.h
> > > #define TUN_F_CSUM 0x01 /* You can hand me unchecksummed packets. */
> > > #define TUN_F_TSO4 0x02 /* I can handle TSO for IPv4 packets */
> > > #define TUN_F_TSO6 0x04 /* I can handle TSO for IPv6 packets */
> > > #define TUN_F_TSO_ECN 0x08 /* I can handle TSO with ECN bits. */
> > > #define TUN_F_UFO 0x10 /* I can handle UFO packets */
> >
> > Yes, that's why I replied in another thread to say that there's no way
> > to refuse GSO packets from userspace, even if TUN_F_XXX is not set via
> > tun_set_offload().
> >
> > E.g you can disable sending GSO packets to guests but you can't reject
> > GSO packets from guest/userspace.
>
> We agree here.
> Sorry for being unclear. I meant following:
> According to the comment the TUN_F_CSUM is a _host_ capability.
> According to the comment the TUN_F_UFO is a _guest_ capability.
>
> But surprisingly when TUN receives TUN_F_UFO it does not propagate it
> anywhere, there is no corresponding NETIF flag.
(It looks like I drop the community and other ccs accidentally, adding
them back and sorry)
Actually, there is one, NETIF_F_GSO_UDP.
Kernel used to have NETIF_F_UFO, but it was removed due to bugs and
the lack of real hardware support. Then we found it breaks uABI, so
Willem tries to make it appear for userspace again, and then it was
renamed to NETIF_F_GSO_UDP.
But I think it's a bug that we don't proporate TUN_F_UFO to NETIF
flag, this is a must for the driver that doesn't support
VIRTIO_NET_F_GUEST_UFO. I just try to disable all offloads and
mrg_rxbuf, then netperf UDP_STREAM from host to guest gives me bad
length packet in the guest.
Willem, I think we probably need to fix this.
> So in fact TUN_F_UFO is processed by the TUN/TAP exactly as a host capability.
>
> >
> > >
> > > So, let's write
> > >
> > > #define TUN_F_UDP_L4TX 0x20 /* You can send me large UDP packets */
> >
> > So if we stick to the assumption "if a guest feature is supported, the
> > corresponding host feature is supported". There's no need for this.
> > And I think it's the most clean way.
>
> My personal opinion is that it is extremely wrong to extend such an
> unobvious assumption to each new feature.
This results in inconsistency with other GSO/CSUM flags. And will
complicate the uAPI (two flags, one for RX another for TX).
Considering the current code works for many years, it's not worth
bothering I think.
>
> >
> > > #define TUN_F_UDP4_L4RX 0x40 /* I can coalesce UDPv4 segments */
> > > #define TUN_F_UDP6_L4RX 0x80 /* I can coalesce UDPv6 segments */
> >
> > Any value to coalesce UDP segments here? It's better to do it in the
> > TX source (guest).
>
> Coalescing is a consent of the guest to receive packets bigger than MTU.
> Otherwise (if the guest does not agree) the host must segment/fragment
> packets before transmitting them to the guest.
This looks like a different feature which is not necessarily known by guests?
Kernel supports GRO which can coalesce packets. (It was not supported
by TAP yet though).
> It is not related to guest TX.
>
> For example, Windows guest is not able to handle large UDP packets
> (this is not supported by the stack yet).
In this case, the corresponding guest or host features will be
disabled, and the kernel won't send those kinds of GSO packets to
guests.
>
> >
> > >
> > > >
> > > > >
> > > > > >
> > > > > > > 2. Later it configures the guest RX path according to guest's needs/capabilities
> > > > > > > Typical initialization sequence is (in case the QEMU supports USO feature):
> > > > > >
> > > > > > It also depends on whether the backend(TAP) has the support for guest RX.
> > > > >
> > > > > In the code of TAP and TUN I do not see any "if the backend has the
> > > > > support for guest RX".
> > > >
> > > > Yes, the detection is implied as you described above.
> > > >
> > > > > This is just the IOCTL and set of TUN_F_* bits. Their meaning is
> > > > > defined in the comments.
> > > > >
> > > > > >
> > > > > > > TAP/TUN set offload 11 (probe for UFO support)
> > > > > > > TAP/TUN set offload 21 (probe for USO support)
> > > > > > > TAP/TUN set offload 0
> > > > > > > ...
> > > > > > > TAP/TUN set offload 7 (configuration of offloads according to GUEST features)
> > > > > > >
> > > > > > > This series of patches is for VIRTIO_NET_F_HOST_USO only, virtio-net
> > > > > > > features like VIRTIO_NET_F_GUEST_USO_(4/6/whatever) are not defined in
> > > > > > > the spec yet.
> > > > > > >
> > > > > >
> > > > > > I'm a little bit confused here. Consider you want to implement guest
> > > > > > TX so there's no need for any modification on the set_offload().
> > > > >
> > > > > I do not think so. Please correct me if I'm mistaken:
> > > > > QEMU needs to indicate the HOST_USO feature (or not indicate).
> > > > > How can QEMU know the kernel is able to support VIRTIO_NET_HDR_GSO_UDP_L4?
> > > >
> > > > Ok, I finally get you idea. Thanks for the patience.
> > > >
> > > > But still one issue: Assume we implement VIRTIO_NET_F_HOST_USO. How
> > > > could we add VIRTIO_NET_F_GUEST_USO in the future? Adding another TUN
> > > > flag for set_offload()? Seems unnecessary and inconsistency with
> > > > current TUN flags.
> > > >
> > > > >
> > > > > >
> > > > > > I think we need to implement both directions at one time as what has
> > > > > > been partially done in this series:
> > > > > >
> > > > >
> > > > > You actually suggest that we need to start from Linux virtio-net
> > > > > driver and implement on it both TX and RX.
> > > > > Our main area is virtio-win drivers and all the rest we do when we can.
> > > > > Currently we have 2 WIP tasks related to Linux (virtio-net RSS and
> > > > > libvirt RSS/eBPF) and (my feeling) we hardly can start with additional
> > > > > one.
> > > >
> > > > I can help for the linux driver if you wish.
> > >
> > > I understand. Probably I've made a mistake from the beginning:
> > > At first stage I've prepared the spec change of what we need in hope
> > > that this will be fast.
> > > Probably the better way was to prepare RFC patches first then start
> > > changing the spec.
> > >
> > > So the question is what to do now:
> > > A)
> > > Finalize patches for guest TX and respective QEMU patches
> > > Prepare RFC patches for guest RX, get ack on them
> > > Change the spec
> > > Finalize patches for guest RX according to the spec
> > >
> > > B)
> > > Reject the patches for guest TX
> > > Prepare RFC patches for everything, get ack on them
> > > Change the spec
> > > Finalize patches for everything according to the spec
> > >
> > > I'm for A) of course :)
> >
> > I'm for B :)
> >
> > The reasons are:
> >
> > 1) keep the assumption of tun_set_offload() to simply the logic and
> > compatibility
> > 2) it's hard or tricky to touch guest TX path only (e.g the
> > virtio_net_hdr_from_skb() is called in both RX and TX)
>
> I suspect there is _some_ misunderstanding here.
> I did not touch virtio_net_hdr_from_skb at all.
>
Typo, actually I meant virtio_net_hdr_to_skb().
Thanks
> >
> > Thanks
> >
> > >
> > > >
> > > > Thanks
> > > >
> > > > > This is a reason why I've added to the virtio spec only HOST_USO and
> > > > > not GUEST_USO4/6.
> > > > > UDP RSC (which is actually guest rx USO) is not available on Windows
> > > > > at the moment.
> > > > >
> > > > > > 1) set_offload() is for guest RX.
> > > > > > 2) virtio_net_hdr_to_skb() is for both guest TX and guest RX.
> > > > > >
> > > > > > For testing, you can run VM2VM on the same host, and you will get
> > > > > > everything tested.
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > > > > >
> > > > > > > > When VIRTIO_NET_F_GUEST_XXX was not negotiated, the corresponding netdev
> > > > > > > > features needs to be disabled. When host tries to send those packets to
> > > > > > > > guest, it needs to do software segmentation.
> > > > > > > >
> > > > > > > > See virtio_net_apply_guest_offloads().
> > > > > > > >
> > > > > > > > There's currently no way (or not need) to prevent tuntap from receiving
> > > > > > > > GSO packets.
> > > > > > > >
> > > > > > > > Thanks
> > > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > >> And how about macvtap?
> > > > > > > > > We will check how to do that for macvtap. We will send a separate
> > > > > > > > > patch for macvtap or ask for advice.
> > > > > > > > >
> > > > > > > > >> Thanks
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >>> }
> > > > > > > > >>>
> > > > > > > > >>> /* This gives the user a way to test for new features in future by
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
On Thu, May 13, 2021 at 10:05 AM Jason Wang <[email protected]> wrote:
>
> On Thu, May 13, 2021 at 12:36 PM Yuri Benditovich
> <[email protected]> wrote:
> >
> > On Thu, May 13, 2021 at 5:07 AM Jason Wang <[email protected]> wrote:
> > >
> > > On Wed, May 12, 2021 at 6:37 PM Yuri Benditovich
> > > <[email protected]> wrote:
> > > >
> > > > On Wed, May 12, 2021 at 12:10 PM Jason Wang <[email protected]> wrote:
> > > > >
> > > > > On Wed, May 12, 2021 at 4:32 PM Yuri Benditovich
> > > > > <[email protected]> wrote:
> > > > > >
> > > > > > On Wed, May 12, 2021 at 10:50 AM Jason Wang <[email protected]> wrote:
> > > > > > >
> > > > > > > On Wed, May 12, 2021 at 1:24 PM Yuri Benditovich
> > > > > > > <[email protected]> wrote:
> > > > > > > >
> > > > > > > > On Wed, May 12, 2021 at 4:33 AM Jason Wang <[email protected]> wrote:
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > 在 2021/5/11 下午4:33, Yuri Benditovich 写道:
> > > > > > > > > > On Tue, May 11, 2021 at 9:50 AM Jason Wang <[email protected]> wrote:
> > > > > > > > > >>
> > > > > > > > > >> 在 2021/5/11 下午12:42, Yuri Benditovich 写道:
> > > > > > > > > >>> Signed-off-by: Yuri Benditovich <[email protected]>
> > > > > > > > > >>> ---
> > > > > > > > > >>> drivers/net/tun.c | 2 +-
> > > > > > > > > >>> 1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > > > > > >>>
> > > > > > > > > >>> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> > > > > > > > > >>> index 84f832806313..a35054f9d941 100644
> > > > > > > > > >>> --- a/drivers/net/tun.c
> > > > > > > > > >>> +++ b/drivers/net/tun.c
> > > > > > > > > >>> @@ -2812,7 +2812,7 @@ static int set_offload(struct tun_struct *tun, unsigned long arg)
> > > > > > > > > >>> arg &= ~(TUN_F_TSO4|TUN_F_TSO6);
> > > > > > > > > >>> }
> > > > > > > > > >>>
> > > > > > > > > >>> - arg &= ~TUN_F_UFO;
> > > > > > > > > >>> + arg &= ~(TUN_F_UFO|TUN_F_USO);
> > > > > > > > > >>
> > > > > > > > > >> It looks to me kernel doesn't use "USO", so TUN_F_UDP_GSO_L4 is a better
> > > > > > > > > >> name for this
> > > > > > > > > > No problem, I can change it in v2
> > > > > > > > > >
> > > > > > > > > > and I guess we should toggle NETIF_F_UDP_GSO_l4 here?
> > > > > > > > > >
> > > > > > > > > > No, we do not, because this indicates only the fact that the guest can
> > > > > > > > > > send large UDP packets and have them splitted to UDP segments.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Actually the reverse. The set_offload() controls the tuntap TX path
> > > > > > > > > (guest RX path).
> > > > > > > >
> > > > > > > > The set_offloads does 2 things:
> > > > > > > > 1. At the initialization time qemu probes set_offload(something) to
> > > > > > > > check which features are supported by TAP/TUN.
> > > > > > >
> > > > > > > Note that the probing is used for guest RX features not host RX.
> > > > > >
> > > > > > It looks like the hidden assumption (till now) is that if some feature
> > > > > > is present - it exists simultaneously for host and guest.
> > > > > > See QEMU get_features: if the TAP/TUN does not have UFO both HOST and
> > > > > > GUEST FEATURES are cleared.
> > > > >
> > > > > Kind of, actually the assumption is: if a guest feature
> > > > > (VIRTIO_NET_F_GUEST_XXX) is support, the corresponding host feature
> > > > > (VIRTIO_NET_F_HOST_XXX) is also supported.
> > > >
> > > > So nothing tells us that the TUNSETOFFLOAD is going to set GUEST offloads.
> > > > From if_tun.h
> > > > #define TUN_F_CSUM 0x01 /* You can hand me unchecksummed packets. */
> > > > #define TUN_F_TSO4 0x02 /* I can handle TSO for IPv4 packets */
> > > > #define TUN_F_TSO6 0x04 /* I can handle TSO for IPv6 packets */
> > > > #define TUN_F_TSO_ECN 0x08 /* I can handle TSO with ECN bits. */
> > > > #define TUN_F_UFO 0x10 /* I can handle UFO packets */
> > >
> > > Yes, that's why I replied in another thread to say that there's no way
> > > to refuse GSO packets from userspace, even if TUN_F_XXX is not set via
> > > tun_set_offload().
> > >
> > > E.g you can disable sending GSO packets to guests but you can't reject
> > > GSO packets from guest/userspace.
> >
> > We agree here.
> > Sorry for being unclear. I meant following:
> > According to the comment the TUN_F_CSUM is a _host_ capability.
> > According to the comment the TUN_F_UFO is a _guest_ capability.
> >
> > But surprisingly when TUN receives TUN_F_UFO it does not propagate it
> > anywhere, there is no corresponding NETIF flag.
>
> (It looks like I drop the community and other ccs accidentally, adding
> them back and sorry)
I thought you did it intentionally to avoid the flame
>
> Actually, there is one, NETIF_F_GSO_UDP.
>
> Kernel used to have NETIF_F_UFO, but it was removed due to bugs and
> the lack of real hardware support. Then we found it breaks uABI, so
> Willem tries to make it appear for userspace again, and then it was
> renamed to NETIF_F_GSO_UDP.
>
> But I think it's a bug that we don't proporate TUN_F_UFO to NETIF
> flag, this is a must for the driver that doesn't support
> VIRTIO_NET_F_GUEST_UFO. I just try to disable all offloads and
> mrg_rxbuf, then netperf UDP_STREAM from host to guest gives me bad
> length packet in the guest.
>
> Willem, I think we probably need to fix this.
>
>
> > So in fact TUN_F_UFO is processed by the TUN/TAP exactly as a host capability.
> >
> > >
> > > >
> > > > So, let's write
> > > >
> > > > #define TUN_F_UDP_L4TX 0x20 /* You can send me large UDP packets */
> > >
> > > So if we stick to the assumption "if a guest feature is supported, the
> > > corresponding host feature is supported". There's no need for this.
> > > And I think it's the most clean way.
> >
> > My personal opinion is that it is extremely wrong to extend such an
> > unobvious assumption to each new feature.
>
> This results in inconsistency with other GSO/CSUM flags. And will
> complicate the uAPI (two flags, one for RX another for TX).
>
> Considering the current code works for many years, it's not worth
> bothering I think.
>
> >
> > >
> > > > #define TUN_F_UDP4_L4RX 0x40 /* I can coalesce UDPv4 segments */
> > > > #define TUN_F_UDP6_L4RX 0x80 /* I can coalesce UDPv6 segments */
> > >
> > > Any value to coalesce UDP segments here? It's better to do it in the
> > > TX source (guest).
> >
> > Coalescing is a consent of the guest to receive packets bigger than MTU.
> > Otherwise (if the guest does not agree) the host must segment/fragment
> > packets before transmitting them to the guest.
>
> This looks like a different feature which is not necessarily known by guests?
>
> Kernel supports GRO which can coalesce packets. (It was not supported
> by TAP yet though).
If I understand things correctly this is exactly this feature:
The guest transmits a large UDP packet with the GSO value that means
that the host should segment it _if needed_.
If the destination (for example another guest) is not able to receive
the original large packet it is segmented and the segments pushed to
that guest.
If the destination can receive the original large packet (currently
not) it is just pushed to it as if it was segmented and then
coalesced.
As an example of the same with TCP:
With the current kernel Windows guest receives coalesced packets (for
example when segmented packets come via physical adapter with
coalescing capability) due to the fact that it dynamically enables
VIRTIO_NET_F_GUEST_TSO via VIRTIO_NET_CTRL_GUEST_OFFLOADS which
finally sets NETIF_F_TSO.
>
>
> > It is not related to guest TX.
> >
> > For example, Windows guest is not able to handle large UDP packets
> > (this is not supported by the stack yet).
>
> In this case, the corresponding guest or host features will be
> disabled, and the kernel won't send those kinds of GSO packets to
> guests.
>
> >
> > >
> > > >
> > > > >
> > > > > >
> > > > > > >
> > > > > > > > 2. Later it configures the guest RX path according to guest's needs/capabilities
> > > > > > > > Typical initialization sequence is (in case the QEMU supports USO feature):
> > > > > > >
> > > > > > > It also depends on whether the backend(TAP) has the support for guest RX.
> > > > > >
> > > > > > In the code of TAP and TUN I do not see any "if the backend has the
> > > > > > support for guest RX".
> > > > >
> > > > > Yes, the detection is implied as you described above.
> > > > >
> > > > > > This is just the IOCTL and set of TUN_F_* bits. Their meaning is
> > > > > > defined in the comments.
> > > > > >
> > > > > > >
> > > > > > > > TAP/TUN set offload 11 (probe for UFO support)
> > > > > > > > TAP/TUN set offload 21 (probe for USO support)
> > > > > > > > TAP/TUN set offload 0
> > > > > > > > ...
> > > > > > > > TAP/TUN set offload 7 (configuration of offloads according to GUEST features)
> > > > > > > >
> > > > > > > > This series of patches is for VIRTIO_NET_F_HOST_USO only, virtio-net
> > > > > > > > features like VIRTIO_NET_F_GUEST_USO_(4/6/whatever) are not defined in
> > > > > > > > the spec yet.
> > > > > > > >
> > > > > > >
> > > > > > > I'm a little bit confused here. Consider you want to implement guest
> > > > > > > TX so there's no need for any modification on the set_offload().
> > > > > >
> > > > > > I do not think so. Please correct me if I'm mistaken:
> > > > > > QEMU needs to indicate the HOST_USO feature (or not indicate).
> > > > > > How can QEMU know the kernel is able to support VIRTIO_NET_HDR_GSO_UDP_L4?
> > > > >
> > > > > Ok, I finally get you idea. Thanks for the patience.
> > > > >
> > > > > But still one issue: Assume we implement VIRTIO_NET_F_HOST_USO. How
> > > > > could we add VIRTIO_NET_F_GUEST_USO in the future? Adding another TUN
> > > > > flag for set_offload()? Seems unnecessary and inconsistency with
> > > > > current TUN flags.
> > > > >
> > > > > >
> > > > > > >
> > > > > > > I think we need to implement both directions at one time as what has
> > > > > > > been partially done in this series:
> > > > > > >
> > > > > >
> > > > > > You actually suggest that we need to start from Linux virtio-net
> > > > > > driver and implement on it both TX and RX.
> > > > > > Our main area is virtio-win drivers and all the rest we do when we can.
> > > > > > Currently we have 2 WIP tasks related to Linux (virtio-net RSS and
> > > > > > libvirt RSS/eBPF) and (my feeling) we hardly can start with additional
> > > > > > one.
> > > > >
> > > > > I can help for the linux driver if you wish.
> > > >
> > > > I understand. Probably I've made a mistake from the beginning:
> > > > At first stage I've prepared the spec change of what we need in hope
> > > > that this will be fast.
> > > > Probably the better way was to prepare RFC patches first then start
> > > > changing the spec.
> > > >
> > > > So the question is what to do now:
> > > > A)
> > > > Finalize patches for guest TX and respective QEMU patches
> > > > Prepare RFC patches for guest RX, get ack on them
> > > > Change the spec
> > > > Finalize patches for guest RX according to the spec
> > > >
> > > > B)
> > > > Reject the patches for guest TX
> > > > Prepare RFC patches for everything, get ack on them
> > > > Change the spec
> > > > Finalize patches for everything according to the spec
> > > >
> > > > I'm for A) of course :)
> > >
> > > I'm for B :)
> > >
> > > The reasons are:
> > >
> > > 1) keep the assumption of tun_set_offload() to simply the logic and
> > > compatibility
> > > 2) it's hard or tricky to touch guest TX path only (e.g the
> > > virtio_net_hdr_from_skb() is called in both RX and TX)
> >
> > I suspect there is _some_ misunderstanding here.
> > I did not touch virtio_net_hdr_from_skb at all.
> >
>
> Typo, actually I meant virtio_net_hdr_to_skb().
OK.
2) tun_get_user() which is guest TX - this is covered
3) tap_get_user() which is guest TX - this is covered
4) {t}packet_send() which is userspace TX - this is OK, the userspace
does not have this feature, it will never use USO
1) receive_buf() which is Linux guest RX - this is interesting
Do you mean that with my patches if Windows VM sends a packet with USO
- the Linux VM will not receive it correctly segmented?
When I send packets with USO via TUN I receive them segmented on
another TUN (2 Windows adapters).
>
> Thanks
>
> > >
> > > Thanks
> > >
> > > >
> > > > >
> > > > > Thanks
> > > > >
> > > > > > This is a reason why I've added to the virtio spec only HOST_USO and
> > > > > > not GUEST_USO4/6.
> > > > > > UDP RSC (which is actually guest rx USO) is not available on Windows
> > > > > > at the moment.
> > > > > >
> > > > > > > 1) set_offload() is for guest RX.
> > > > > > > 2) virtio_net_hdr_to_skb() is for both guest TX and guest RX.
> > > > > > >
> > > > > > > For testing, you can run VM2VM on the same host, and you will get
> > > > > > > everything tested.
> > > > > > >
> > > > > > > Thanks
> > > > > > >
> > > > > > > > >
> > > > > > > > > When VIRTIO_NET_F_GUEST_XXX was not negotiated, the corresponding netdev
> > > > > > > > > features needs to be disabled. When host tries to send those packets to
> > > > > > > > > guest, it needs to do software segmentation.
> > > > > > > > >
> > > > > > > > > See virtio_net_apply_guest_offloads().
> > > > > > > > >
> > > > > > > > > There's currently no way (or not need) to prevent tuntap from receiving
> > > > > > > > > GSO packets.
> > > > > > > > >
> > > > > > > > > Thanks
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >> And how about macvtap?
> > > > > > > > > > We will check how to do that for macvtap. We will send a separate
> > > > > > > > > > patch for macvtap or ask for advice.
> > > > > > > > > >
> > > > > > > > > >> Thanks
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > > >>> }
> > > > > > > > > >>>
> > > > > > > > > >>> /* This gives the user a way to test for new features in future by
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
> > > > > So the question is what to do now:
> > > > > A)
> > > > > Finalize patches for guest TX and respective QEMU patches
> > > > > Prepare RFC patches for guest RX, get ack on them
> > > > > Change the spec
> > > > > Finalize patches for guest RX according to the spec
> > > > >
> > > > > B)
> > > > > Reject the patches for guest TX
> > > > > Prepare RFC patches for everything, get ack on them
> > > > > Change the spec
> > > > > Finalize patches for everything according to the spec
> > > > >
> > > > > I'm for A) of course :)
> > > >
> > > > I'm for B :)
> > > >
> > > > The reasons are:
> > > >
> > > > 1) keep the assumption of tun_set_offload() to simply the logic and
> > > > compatibility
> > > > 2) it's hard or tricky to touch guest TX path only (e.g the
> > > > virtio_net_hdr_from_skb() is called in both RX and TX)
> > >
> > > I suspect there is _some_ misunderstanding here.
> > > I did not touch virtio_net_hdr_from_skb at all.
> > >
> >
> > Typo, actually I meant virtio_net_hdr_to_skb().
> OK.
> 2) tun_get_user() which is guest TX - this is covered
> 3) tap_get_user() which is guest TX - this is covered
> 4) {t}packet_send() which is userspace TX - this is OK, the userspace
> does not have this feature, it will never use USO
What do you mean exactly? I can certainly imagine packet socket users
that could benefit from using udp gso.
When adding support for a new GSO type in virtio_net_hdr, it ideally
is supported by all users of that interface. Alternatively, if some
users do not support the flag, a call that sets the flag has to
(continue to) fail hard, so that we can enable it at a later time.
> > But surprisingly when TUN receives TUN_F_UFO it does not propagate it
> > anywhere, there is no corresponding NETIF flag.
>
> (It looks like I drop the community and other ccs accidentally, adding
> them back and sorry)
>
> Actually, there is one, NETIF_F_GSO_UDP.
>
> Kernel used to have NETIF_F_UFO, but it was removed due to bugs and
> the lack of real hardware support. Then we found it breaks uABI, so
> Willem tries to make it appear for userspace again, and then it was
> renamed to NETIF_F_GSO_UDP.
>
> But I think it's a bug that we don't proporate TUN_F_UFO to NETIF
> flag, this is a must for the driver that doesn't support
> VIRTIO_NET_F_GUEST_UFO. I just try to disable all offloads and
> mrg_rxbuf, then netperf UDP_STREAM from host to guest gives me bad
> length packet in the guest.
>
> Willem, I think we probably need to fix this.
We had to add back support for the kernel to accept UFO packets from
userspace over tuntap.
The kernel does not generate such packets, so a guest should never be
concerned of receiving UFO packets.
Perhaps i'm misunderstanding the problem here.
On Fri, May 14, 2021 at 10:16 AM Jason Wang <[email protected]> wrote:
>
> On Fri, May 14, 2021 at 4:35 AM Willem de Bruijn
> <[email protected]> wrote:
> >
> > > > But surprisingly when TUN receives TUN_F_UFO it does not propagate it
> > > > anywhere, there is no corresponding NETIF flag.
> > >
> > > (It looks like I drop the community and other ccs accidentally, adding
> > > them back and sorry)
> > >
> > > Actually, there is one, NETIF_F_GSO_UDP.
> > >
> > > Kernel used to have NETIF_F_UFO, but it was removed due to bugs and
> > > the lack of real hardware support. Then we found it breaks uABI, so
> > > Willem tries to make it appear for userspace again, and then it was
> > > renamed to NETIF_F_GSO_UDP.
> > >
> > > But I think it's a bug that we don't proporate TUN_F_UFO to NETIF
> > > flag, this is a must for the driver that doesn't support
> > > VIRTIO_NET_F_GUEST_UFO. I just try to disable all offloads and
> > > mrg_rxbuf, then netperf UDP_STREAM from host to guest gives me bad
> > > length packet in the guest.
> > >
> > > Willem, I think we probably need to fix this.
> >
> > We had to add back support for the kernel to accept UFO packets from
> > userspace over tuntap.
> >
> > The kernel does not generate such packets, so a guest should never be
> > concerned of receiving UFO packets.
>
> That's my feeling as well.
>
> But when I:
>
> 1) turn off all guest gso feature and mrg rx buffers, in this case
> virtio-net will only allocate 1500 bytes for each packet
> 2) doing netperf (UDP_STREAM) from local host to guest, I see packet
> were truncated in the guest
Is it possible that the virtio-net does not disable UFO offload?
IMO it sets NETIF_F_LRO too bravely.
>
> >
> > Perhaps i'm misunderstanding the problem here.
> >
>
> I will re-check and get back to you.
> (probably need a while since I will not be online for the next week).
>
> Thanks
>
On Thu, May 13, 2021 at 11:43 PM Willem de Bruijn
<[email protected]> wrote:
>
> > > > > > So the question is what to do now:
> > > > > > A)
> > > > > > Finalize patches for guest TX and respective QEMU patches
> > > > > > Prepare RFC patches for guest RX, get ack on them
> > > > > > Change the spec
> > > > > > Finalize patches for guest RX according to the spec
> > > > > >
> > > > > > B)
> > > > > > Reject the patches for guest TX
> > > > > > Prepare RFC patches for everything, get ack on them
> > > > > > Change the spec
> > > > > > Finalize patches for everything according to the spec
> > > > > >
> > > > > > I'm for A) of course :)
> > > > >
> > > > > I'm for B :)
> > > > >
> > > > > The reasons are:
> > > > >
> > > > > 1) keep the assumption of tun_set_offload() to simply the logic and
> > > > > compatibility
> > > > > 2) it's hard or tricky to touch guest TX path only (e.g the
> > > > > virtio_net_hdr_from_skb() is called in both RX and TX)
> > > >
> > > > I suspect there is _some_ misunderstanding here.
> > > > I did not touch virtio_net_hdr_from_skb at all.
> > > >
> > >
> > > Typo, actually I meant virtio_net_hdr_to_skb().
> > OK.
> > 2) tun_get_user() which is guest TX - this is covered
> > 3) tap_get_user() which is guest TX - this is covered
> > 4) {t}packet_send() which is userspace TX - this is OK, the userspace
> > does not have this feature, it will never use USO
>
> What do you mean exactly? I can certainly imagine packet socket users
> that could benefit from using udp gso.
I've just tried to understand whether we have a real functional
problem due to the fact that I define the USO feature only for guest
TX path.
This set of patches modifies virtio_net_hdr_to_skb and Jason's comment
was that this procedure is called in both guest TX and RX, there are 4
places where the virtio_net_hdr_to_skb is called, userspace TX is one
of them.
AFAIU userspace 'socket' and 'user' backends of qemu do not have any
offloads at all so they will never use USO also.
Sorry for misunderstanding if any.
>
> When adding support for a new GSO type in virtio_net_hdr, it ideally
> is supported by all users of that interface. Alternatively, if some
> users do not support the flag, a call that sets the flag has to
> (continue to) fail hard, so that we can enable it at a later time.
I agree of course. IMO this is what I've tried to do. I did not have
in the initial plan to make Linux virtio-net to use the USO at all but
this should not present any problem (if I'm not mistaken).
On Fri, May 14, 2021 at 4:35 AM Willem de Bruijn
<[email protected]> wrote:
>
> > > But surprisingly when TUN receives TUN_F_UFO it does not propagate it
> > > anywhere, there is no corresponding NETIF flag.
> >
> > (It looks like I drop the community and other ccs accidentally, adding
> > them back and sorry)
> >
> > Actually, there is one, NETIF_F_GSO_UDP.
> >
> > Kernel used to have NETIF_F_UFO, but it was removed due to bugs and
> > the lack of real hardware support. Then we found it breaks uABI, so
> > Willem tries to make it appear for userspace again, and then it was
> > renamed to NETIF_F_GSO_UDP.
> >
> > But I think it's a bug that we don't proporate TUN_F_UFO to NETIF
> > flag, this is a must for the driver that doesn't support
> > VIRTIO_NET_F_GUEST_UFO. I just try to disable all offloads and
> > mrg_rxbuf, then netperf UDP_STREAM from host to guest gives me bad
> > length packet in the guest.
> >
> > Willem, I think we probably need to fix this.
>
> We had to add back support for the kernel to accept UFO packets from
> userspace over tuntap.
>
> The kernel does not generate such packets, so a guest should never be
> concerned of receiving UFO packets.
That's my feeling as well.
But when I:
1) turn off all guest gso feature and mrg rx buffers, in this case
virtio-net will only allocate 1500 bytes for each packet
2) doing netperf (UDP_STREAM) from local host to guest, I see packet
were truncated in the guest
>
> Perhaps i'm misunderstanding the problem here.
>
I will re-check and get back to you.
(probably need a while since I will not be online for the next week).
Thanks
On Fri, May 14, 2021 at 3:39 AM Yuri Benditovich
<[email protected]> wrote:
>
> On Fri, May 14, 2021 at 10:16 AM Jason Wang <[email protected]> wrote:
> >
> > On Fri, May 14, 2021 at 4:35 AM Willem de Bruijn
> > <[email protected]> wrote:
> > >
> > > > > But surprisingly when TUN receives TUN_F_UFO it does not propagate it
> > > > > anywhere, there is no corresponding NETIF flag.
> > > >
> > > > (It looks like I drop the community and other ccs accidentally, adding
> > > > them back and sorry)
> > > >
> > > > Actually, there is one, NETIF_F_GSO_UDP.
> > > >
> > > > Kernel used to have NETIF_F_UFO, but it was removed due to bugs and
> > > > the lack of real hardware support. Then we found it breaks uABI, so
> > > > Willem tries to make it appear for userspace again, and then it was
> > > > renamed to NETIF_F_GSO_UDP.
> > > >
> > > > But I think it's a bug that we don't proporate TUN_F_UFO to NETIF
> > > > flag, this is a must for the driver that doesn't support
> > > > VIRTIO_NET_F_GUEST_UFO. I just try to disable all offloads and
> > > > mrg_rxbuf, then netperf UDP_STREAM from host to guest gives me bad
> > > > length packet in the guest.
> > > >
> > > > Willem, I think we probably need to fix this.
> > >
> > > We had to add back support for the kernel to accept UFO packets from
> > > userspace over tuntap.
> > >
> > > The kernel does not generate such packets, so a guest should never be
> > > concerned of receiving UFO packets.
> >
> > That's my feeling as well.
> >
> > But when I:
> >
> > 1) turn off all guest gso feature and mrg rx buffers, in this case
> > virtio-net will only allocate 1500 bytes for each packet
> > 2) doing netperf (UDP_STREAM) from local host to guest, I see packet
> > were truncated in the guest
>
> Is it possible that the virtio-net does not disable UFO offload?
> IMO it sets NETIF_F_LRO too bravely.
After we removed UFO completely, we found that guests may be migrated
from old hosts with UFO support to newer without. And that they do not
renegotiate features, so will continue to send UFO packets.
I added back the absolute minimum support for UFO: for a host to be
able to accept such UFO packets from userspace. But no device can
advertise or negotiate the NETIF_F_USO feature again. If these packets
arrive on the egress path, they will be immediately software segmented
(or fragmented) in skb_segment. So the host will not forward such
packets to another guest.
The behavior that Jason is experiencing, truncated packets received in
a guest from the host, sound unrelated to this feature to me. Can you
see what the original UDP datagram length is? Are these packets just
marginally larger, or indeed clearly U[SF]O packets well beyond any
reasonable MTU size?
Another option is that this is related to the host stack support for
UDP_GRO. The stack can now build large packets, segments these on
demand if needed (e.g., if such a packet arrives at a local socket
that does not advertise UDP_GRO). Perhaps somehow such packets escape
un-segmented to a guest. Do any devices where these packets may
originate have features NETIF_F_GRO_UDP_FWD or NETIF_F_GRO_FRAGLIST
enabled?