RFC -> v1:
* added 'netns' module param to vsock.ko to enable the
network namespace support (disabled by default)
* added 'vsock_net_eq()' to check the "net" assigned to a socket
only when 'netns' support is enabled
RFC: https://patchwork.ozlabs.org/cover/1202235/
Now that we have multi-transport upstream, I started to take a look to
support network namespace in vsock.
As we partially discussed in the multi-transport proposal [1], it could
be nice to support network namespace in vsock to reach the following
goals:
- isolate host applications from guest applications using the same ports
with CID_ANY
- assign the same CID of VMs running in different network namespaces
- partition VMs between VMMs or at finer granularity
This new feature is disabled by default, because it changes vsock's
behavior with network namespaces and could break existing applications.
It can be enabled with the new 'netns' module parameter of vsock.ko.
This implementation provides the following behavior:
- packets received from the host (received by G2H transports) are
assigned to the default netns (init_net)
- packets received from the guest (received by H2G - vhost-vsock) are
assigned to the netns of the process that opens /dev/vhost-vsock
(usually the VMM, qemu in my tests, opens the /dev/vhost-vsock)
- for vmci I need some suggestions, because I don't know how to do
and test the same in the vmci driver, for now vmci uses the
init_net
- loopback packets are exchanged only in the same netns
I tested the series in this way:
l0_host$ qemu-system-x86_64 -m 4G -M accel=kvm -smp 4 \
-drive file=/tmp/vsockvm0.img,if=virtio --nographic \
-device vhost-vsock-pci,guest-cid=3
l1_vm$ echo 1 > /sys/module/vsock/parameters/netns
l1_vm$ ip netns add ns1
l1_vm$ ip netns add ns2
# same CID on different netns
l1_vm$ ip netns exec ns1 qemu-system-x86_64 -m 1G -M accel=kvm -smp 2 \
-drive file=/tmp/vsockvm1.img,if=virtio --nographic \
-device vhost-vsock-pci,guest-cid=4
l1_vm$ ip netns exec ns2 qemu-system-x86_64 -m 1G -M accel=kvm -smp 2 \
-drive file=/tmp/vsockvm2.img,if=virtio --nographic \
-device vhost-vsock-pci,guest-cid=4
# all iperf3 listen on CID_ANY and port 5201, but in different netns
l1_vm$ ./iperf3 --vsock -s # connection from l0 or guests started
# on default netns (init_net)
l1_vm$ ip netns exec ns1 ./iperf3 --vsock -s
l1_vm$ ip netns exec ns1 ./iperf3 --vsock -s
l0_host$ ./iperf3 --vsock -c 3
l2_vm1$ ./iperf3 --vsock -c 2
l2_vm2$ ./iperf3 --vsock -c 2
[1] https://www.spinics.net/lists/netdev/msg575792.html
Stefano Garzarella (3):
vsock: add network namespace support
vsock/virtio_transport_common: handle netns of received packets
vhost/vsock: use netns of process that opens the vhost-vsock device
drivers/vhost/vsock.c | 29 ++++++++++++-----
include/linux/virtio_vsock.h | 2 ++
include/net/af_vsock.h | 7 +++--
net/vmw_vsock/af_vsock.c | 41 +++++++++++++++++++------
net/vmw_vsock/hyperv_transport.c | 5 +--
net/vmw_vsock/virtio_transport.c | 2 ++
net/vmw_vsock/virtio_transport_common.c | 12 ++++++--
net/vmw_vsock/vmci_transport.c | 5 +--
8 files changed, 78 insertions(+), 25 deletions(-)
--
2.24.1
This patch assigns the network namespace of the process that opened
vhost-vsock device (e.g. VMM) to the packets coming from the guest,
allowing only host sockets in the same network namespace to
communicate with the guest.
This patch also allows having different VMs, running in different
network namespace, with the same CID.
Signed-off-by: Stefano Garzarella <[email protected]>
---
RFC -> v1
* used 'vsock_net_eq()' insted of 'net_eq()'
---
drivers/vhost/vsock.c | 30 +++++++++++++++++++++---------
1 file changed, 21 insertions(+), 9 deletions(-)
diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index f1d39939d5e4..8b0169105559 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -40,6 +40,7 @@ static DEFINE_READ_MOSTLY_HASHTABLE(vhost_vsock_hash, 8);
struct vhost_vsock {
struct vhost_dev dev;
struct vhost_virtqueue vqs[2];
+ struct net *net;
/* Link to global vhost_vsock_hash, writes use vhost_vsock_mutex */
struct hlist_node hash;
@@ -61,7 +62,7 @@ static u32 vhost_transport_get_local_cid(void)
/* Callers that dereference the return value must hold vhost_vsock_mutex or the
* RCU read lock.
*/
-static struct vhost_vsock *vhost_vsock_get(u32 guest_cid)
+static struct vhost_vsock *vhost_vsock_get(u32 guest_cid, struct net *net)
{
struct vhost_vsock *vsock;
@@ -72,7 +73,7 @@ static struct vhost_vsock *vhost_vsock_get(u32 guest_cid)
if (other_cid == 0)
continue;
- if (other_cid == guest_cid)
+ if (other_cid == guest_cid && vsock_net_eq(net, vsock->net))
return vsock;
}
@@ -245,7 +246,7 @@ vhost_transport_send_pkt(struct virtio_vsock_pkt *pkt)
rcu_read_lock();
/* Find the vhost_vsock according to guest context id */
- vsock = vhost_vsock_get(le64_to_cpu(pkt->hdr.dst_cid));
+ vsock = vhost_vsock_get(le64_to_cpu(pkt->hdr.dst_cid), pkt->net);
if (!vsock) {
rcu_read_unlock();
virtio_transport_free_pkt(pkt);
@@ -277,7 +278,8 @@ vhost_transport_cancel_pkt(struct vsock_sock *vsk)
rcu_read_lock();
/* Find the vhost_vsock according to guest context id */
- vsock = vhost_vsock_get(vsk->remote_addr.svm_cid);
+ vsock = vhost_vsock_get(vsk->remote_addr.svm_cid,
+ sock_net(sk_vsock(vsk)));
if (!vsock)
goto out;
@@ -474,7 +476,7 @@ static void vhost_vsock_handle_tx_kick(struct vhost_work *work)
continue;
}
- pkt->net = vsock_default_net();
+ pkt->net = vsock->net;
len = pkt->len;
/* Deliver to monitoring devices all received packets */
@@ -608,7 +610,14 @@ static int vhost_vsock_dev_open(struct inode *inode, struct file *file)
vqs = kmalloc_array(ARRAY_SIZE(vsock->vqs), sizeof(*vqs), GFP_KERNEL);
if (!vqs) {
ret = -ENOMEM;
- goto out;
+ goto out_vsock;
+ }
+
+ /* Derive the network namespace from the pid opening the device */
+ vsock->net = get_net_ns_by_pid(current->pid);
+ if (IS_ERR(vsock->net)) {
+ ret = PTR_ERR(vsock->net);
+ goto out_vqs;
}
vsock->guest_cid = 0; /* no CID assigned yet */
@@ -630,7 +639,9 @@ static int vhost_vsock_dev_open(struct inode *inode, struct file *file)
vhost_work_init(&vsock->send_pkt_work, vhost_transport_send_pkt_work);
return 0;
-out:
+out_vqs:
+ kfree(vqs);
+out_vsock:
vhost_vsock_free(vsock);
return ret;
}
@@ -655,7 +666,7 @@ static void vhost_vsock_reset_orphans(struct sock *sk)
*/
/* If the peer is still valid, no need to reset connection */
- if (vhost_vsock_get(vsk->remote_addr.svm_cid))
+ if (vhost_vsock_get(vsk->remote_addr.svm_cid, sock_net(sk)))
return;
/* If the close timeout is pending, let it expire. This avoids races
@@ -703,6 +714,7 @@ static int vhost_vsock_dev_release(struct inode *inode, struct file *file)
spin_unlock_bh(&vsock->send_pkt_list_lock);
vhost_dev_cleanup(&vsock->dev);
+ put_net(vsock->net);
kfree(vsock->dev.vqs);
vhost_vsock_free(vsock);
return 0;
@@ -729,7 +741,7 @@ static int vhost_vsock_set_cid(struct vhost_vsock *vsock, u64 guest_cid)
/* Refuse if CID is already in use */
mutex_lock(&vhost_vsock_mutex);
- other = vhost_vsock_get(guest_cid);
+ other = vhost_vsock_get(guest_cid, vsock->net);
if (other && other != vsock) {
mutex_unlock(&vhost_vsock_mutex);
return -EADDRINUSE;
--
2.24.1
This patch allows transports that use virtio_transport_common
to specify the network namespace where a received packet is to
be delivered.
virtio_transport and vhost_transport, for now, use the default
network namespace.
vsock_loopback uses the same network namespace of the transmitter.
Signed-off-by: Stefano Garzarella <[email protected]>
---
drivers/vhost/vsock.c | 1 +
include/linux/virtio_vsock.h | 2 ++
net/vmw_vsock/virtio_transport.c | 2 ++
net/vmw_vsock/virtio_transport_common.c | 13 ++++++++++---
4 files changed, 15 insertions(+), 3 deletions(-)
diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index c2d7d57e98cf..f1d39939d5e4 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -474,6 +474,7 @@ static void vhost_vsock_handle_tx_kick(struct vhost_work *work)
continue;
}
+ pkt->net = vsock_default_net();
len = pkt->len;
/* Deliver to monitoring devices all received packets */
diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
index 71c81e0dc8f2..d4fc93e6e03e 100644
--- a/include/linux/virtio_vsock.h
+++ b/include/linux/virtio_vsock.h
@@ -43,6 +43,7 @@ struct virtio_vsock_pkt {
struct list_head list;
/* socket refcnt not held, only use for cancellation */
struct vsock_sock *vsk;
+ struct net *net;
void *buf;
u32 buf_len;
u32 len;
@@ -54,6 +55,7 @@ struct virtio_vsock_pkt_info {
u32 remote_cid, remote_port;
struct vsock_sock *vsk;
struct msghdr *msg;
+ struct net *net;
u32 pkt_len;
u16 type;
u16 op;
diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
index dfbaf6bd8b1c..fb03a1535c21 100644
--- a/net/vmw_vsock/virtio_transport.c
+++ b/net/vmw_vsock/virtio_transport.c
@@ -527,6 +527,8 @@ static void virtio_transport_rx_work(struct work_struct *work)
}
pkt->len = len - sizeof(pkt->hdr);
+ pkt->net = vsock_default_net();
+
virtio_transport_deliver_tap_pkt(pkt);
virtio_transport_recv_pkt(&virtio_transport, pkt);
}
diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index cecdfd91ed00..6402dea62e45 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -63,6 +63,7 @@ virtio_transport_alloc_pkt(struct virtio_vsock_pkt_info *info,
pkt->hdr.len = cpu_to_le32(len);
pkt->reply = info->reply;
pkt->vsk = info->vsk;
+ pkt->net = info->net;
if (info->msg && len > 0) {
pkt->buf = kmalloc(len, GFP_KERNEL);
@@ -273,6 +274,7 @@ static int virtio_transport_send_credit_update(struct vsock_sock *vsk,
.op = VIRTIO_VSOCK_OP_CREDIT_UPDATE,
.type = type,
.vsk = vsk,
+ .net = sock_net(sk_vsock(vsk)),
};
return virtio_transport_send_pkt_info(vsk, &info);
@@ -622,6 +624,7 @@ int virtio_transport_connect(struct vsock_sock *vsk)
.op = VIRTIO_VSOCK_OP_REQUEST,
.type = VIRTIO_VSOCK_TYPE_STREAM,
.vsk = vsk,
+ .net = sock_net(sk_vsock(vsk)),
};
return virtio_transport_send_pkt_info(vsk, &info);
@@ -638,6 +641,7 @@ int virtio_transport_shutdown(struct vsock_sock *vsk, int mode)
(mode & SEND_SHUTDOWN ?
VIRTIO_VSOCK_SHUTDOWN_SEND : 0),
.vsk = vsk,
+ .net = sock_net(sk_vsock(vsk)),
};
return virtio_transport_send_pkt_info(vsk, &info);
@@ -665,6 +669,7 @@ virtio_transport_stream_enqueue(struct vsock_sock *vsk,
.msg = msg,
.pkt_len = len,
.vsk = vsk,
+ .net = sock_net(sk_vsock(vsk)),
};
return virtio_transport_send_pkt_info(vsk, &info);
@@ -687,6 +692,7 @@ static int virtio_transport_reset(struct vsock_sock *vsk,
.type = VIRTIO_VSOCK_TYPE_STREAM,
.reply = !!pkt,
.vsk = vsk,
+ .net = sock_net(sk_vsock(vsk)),
};
/* Send RST only if the original pkt is not a RST pkt */
@@ -707,6 +713,7 @@ static int virtio_transport_reset_no_sock(const struct virtio_transport *t,
.op = VIRTIO_VSOCK_OP_RST,
.type = le16_to_cpu(pkt->hdr.type),
.reply = true,
+ .net = pkt->net,
};
/* Send RST only if the original pkt is not a RST pkt */
@@ -991,6 +998,7 @@ virtio_transport_send_response(struct vsock_sock *vsk,
.remote_port = le32_to_cpu(pkt->hdr.src_port),
.reply = true,
.vsk = vsk,
+ .net = sock_net(sk_vsock(vsk)),
};
return virtio_transport_send_pkt_info(vsk, &info);
@@ -1088,7 +1096,6 @@ virtio_transport_recv_listen(struct sock *sk, struct virtio_vsock_pkt *pkt,
void virtio_transport_recv_pkt(struct virtio_transport *t,
struct virtio_vsock_pkt *pkt)
{
- struct net *net = vsock_default_net();
struct sockaddr_vm src, dst;
struct vsock_sock *vsk;
struct sock *sk;
@@ -1116,9 +1123,9 @@ void virtio_transport_recv_pkt(struct virtio_transport *t,
/* The socket must be in connected or bound table
* otherwise send reset back
*/
- sk = vsock_find_connected_socket(&src, &dst, net);
+ sk = vsock_find_connected_socket(&src, &dst, pkt->net);
if (!sk) {
- sk = vsock_find_bound_socket(&dst, net);
+ sk = vsock_find_bound_socket(&dst, pkt->net);
if (!sk) {
(void)virtio_transport_reset_no_sock(t, pkt);
goto free_pkt;
--
2.24.1
What should vsock_dev_do_ioctl() IOCTL_VM_SOCKETS_GET_LOCAL_CID return?
The answer is probably dependent on the caller's network namespace.
Ultimately we may need per-namespace transports. Imagine assigning a
G2H transport to a specific network namespace.
vsock_stream_connect() needs to be namespace-aware so that other
namespaces cannot use the G2H transport to send a connection
establishment packet.
On Tue, Jan 21, 2020 at 03:50:53PM +0000, Stefan Hajnoczi wrote:
> What should vsock_dev_do_ioctl() IOCTL_VM_SOCKETS_GET_LOCAL_CID return?
> The answer is probably dependent on the caller's network namespace.
Right, and I'm not handling this case. I'll fix!
>
> Ultimately we may need per-namespace transports. Imagine assigning a
> G2H transport to a specific network namespace.
Agree.
>
> vsock_stream_connect() needs to be namespace-aware so that other
> namespaces cannot use the G2H transport to send a connection
> establishment packet.
Right, maybe I can change the vsock_assign_transport() to check if a
transport can be assigned to a socket, checking the namespace.
I'll send a v2 handling these cases and implementing the Michael's idea
about /dev/vhost-vsock-netns
Thanks,
Stefano
Hi David, Michael, Stefan,
I'm restarting to work on this topic since Kata guys are interested to
have that, especially on the guest side.
While working on the v2 I had few doubts, and I'd like to have your
suggestions:
1. netns assigned to the device inside the guest
Currently I assigned this device to 'init_net'. Maybe it is better
if we allow the user to decide which netns assign to the device
or to disable this new feature to have the same behavior as before
(host reachable from any netns).
I think we can handle this in the vsock core and not in the single
transports.
The simplest way that I found, is to add a new
IOCTL_VM_SOCKETS_ASSIGN_G2H_NETNS to /dev/vsock to enable the feature
and assign the device to the same netns of the process that do the
ioctl(), but I'm not sure it is clean enough.
Maybe it is better to add new rtnetlink messages, but I'm not sure if
it is feasible since we don't have a netdev device.
What do you suggest?
2. netns assigned in the host
As Michael suggested, I added a new /dev/vhost-vsock-netns to allow
userspace application to use this new feature, leaving to
/dev/vhost-vsock the previous behavior (guest reachable from any
netns).
I like this approach, but I had these doubts:
- I need to allocate a new minor for that device (e.g.
VHOST_VSOCK_NETNS_MINOR) or is there an alternative way that I can
use?
- It is vhost-vsock specific, should we provide something handled in
the vsock core, maybe centralizing the CID allocation and adding a
new IOCTL or rtnetlink message like for the guest side?
(maybe it could be a second step, and for now we can continue with
the new device)
Thanks for the help,
Stefano
On Thu, Jan 16, 2020 at 06:24:25PM +0100, Stefano Garzarella wrote:
> RFC -> v1:
> * added 'netns' module param to vsock.ko to enable the
> network namespace support (disabled by default)
> * added 'vsock_net_eq()' to check the "net" assigned to a socket
> only when 'netns' support is enabled
>
> RFC: https://patchwork.ozlabs.org/cover/1202235/
>
> Now that we have multi-transport upstream, I started to take a look to
> support network namespace in vsock.
>
> As we partially discussed in the multi-transport proposal [1], it could
> be nice to support network namespace in vsock to reach the following
> goals:
> - isolate host applications from guest applications using the same ports
> with CID_ANY
> - assign the same CID of VMs running in different network namespaces
> - partition VMs between VMMs or at finer granularity
>
> This new feature is disabled by default, because it changes vsock's
> behavior with network namespaces and could break existing applications.
> It can be enabled with the new 'netns' module parameter of vsock.ko.
>
> This implementation provides the following behavior:
> - packets received from the host (received by G2H transports) are
> assigned to the default netns (init_net)
> - packets received from the guest (received by H2G - vhost-vsock) are
> assigned to the netns of the process that opens /dev/vhost-vsock
> (usually the VMM, qemu in my tests, opens the /dev/vhost-vsock)
> - for vmci I need some suggestions, because I don't know how to do
> and test the same in the vmci driver, for now vmci uses the
> init_net
> - loopback packets are exchanged only in the same netns
>
> I tested the series in this way:
> l0_host$ qemu-system-x86_64 -m 4G -M accel=kvm -smp 4 \
> -drive file=/tmp/vsockvm0.img,if=virtio --nographic \
> -device vhost-vsock-pci,guest-cid=3
>
> l1_vm$ echo 1 > /sys/module/vsock/parameters/netns
>
> l1_vm$ ip netns add ns1
> l1_vm$ ip netns add ns2
> # same CID on different netns
> l1_vm$ ip netns exec ns1 qemu-system-x86_64 -m 1G -M accel=kvm -smp 2 \
> -drive file=/tmp/vsockvm1.img,if=virtio --nographic \
> -device vhost-vsock-pci,guest-cid=4
> l1_vm$ ip netns exec ns2 qemu-system-x86_64 -m 1G -M accel=kvm -smp 2 \
> -drive file=/tmp/vsockvm2.img,if=virtio --nographic \
> -device vhost-vsock-pci,guest-cid=4
>
> # all iperf3 listen on CID_ANY and port 5201, but in different netns
> l1_vm$ ./iperf3 --vsock -s # connection from l0 or guests started
> # on default netns (init_net)
> l1_vm$ ip netns exec ns1 ./iperf3 --vsock -s
> l1_vm$ ip netns exec ns1 ./iperf3 --vsock -s
>
> l0_host$ ./iperf3 --vsock -c 3
> l2_vm1$ ./iperf3 --vsock -c 2
> l2_vm2$ ./iperf3 --vsock -c 2
>
> [1] https://www.spinics.net/lists/netdev/msg575792.html
>
> Stefano Garzarella (3):
> vsock: add network namespace support
> vsock/virtio_transport_common: handle netns of received packets
> vhost/vsock: use netns of process that opens the vhost-vsock device
>
> drivers/vhost/vsock.c | 29 ++++++++++++-----
> include/linux/virtio_vsock.h | 2 ++
> include/net/af_vsock.h | 7 +++--
> net/vmw_vsock/af_vsock.c | 41 +++++++++++++++++++------
> net/vmw_vsock/hyperv_transport.c | 5 +--
> net/vmw_vsock/virtio_transport.c | 2 ++
> net/vmw_vsock/virtio_transport_common.c | 12 ++++++--
> net/vmw_vsock/vmci_transport.c | 5 +--
> 8 files changed, 78 insertions(+), 25 deletions(-)
>
> --
> 2.24.1
>
On Mon, Apr 27, 2020 at 04:25:18PM +0200, Stefano Garzarella wrote:
> Hi David, Michael, Stefan,
> I'm restarting to work on this topic since Kata guys are interested to
> have that, especially on the guest side.
>
> While working on the v2 I had few doubts, and I'd like to have your
> suggestions:
>
> 1. netns assigned to the device inside the guest
>
> Currently I assigned this device to 'init_net'. Maybe it is better
> if we allow the user to decide which netns assign to the device
> or to disable this new feature to have the same behavior as before
> (host reachable from any netns).
> I think we can handle this in the vsock core and not in the single
> transports.
>
> The simplest way that I found, is to add a new
> IOCTL_VM_SOCKETS_ASSIGN_G2H_NETNS to /dev/vsock to enable the feature
> and assign the device to the same netns of the process that do the
> ioctl(), but I'm not sure it is clean enough.
>
> Maybe it is better to add new rtnetlink messages, but I'm not sure if
> it is feasible since we don't have a netdev device.
>
> What do you suggest?
Maybe /dev/vsock-netns here too, like in the host?
>
> 2. netns assigned in the host
>
> As Michael suggested, I added a new /dev/vhost-vsock-netns to allow
> userspace application to use this new feature, leaving to
> /dev/vhost-vsock the previous behavior (guest reachable from any
> netns).
>
> I like this approach, but I had these doubts:
>
> - I need to allocate a new minor for that device (e.g.
> VHOST_VSOCK_NETNS_MINOR) or is there an alternative way that I can
> use?
Not that I see. I agree it's a bit annoying. I'll think about it a bit.
> - It is vhost-vsock specific, should we provide something handled in
> the vsock core, maybe centralizing the CID allocation and adding a
> new IOCTL or rtnetlink message like for the guest side?
> (maybe it could be a second step, and for now we can continue with
> the new device)
>
>
> Thanks for the help,
> Stefano
>
>
> On Thu, Jan 16, 2020 at 06:24:25PM +0100, Stefano Garzarella wrote:
> > RFC -> v1:
> > * added 'netns' module param to vsock.ko to enable the
> > network namespace support (disabled by default)
> > * added 'vsock_net_eq()' to check the "net" assigned to a socket
> > only when 'netns' support is enabled
> >
> > RFC: https://patchwork.ozlabs.org/cover/1202235/
> >
> > Now that we have multi-transport upstream, I started to take a look to
> > support network namespace in vsock.
> >
> > As we partially discussed in the multi-transport proposal [1], it could
> > be nice to support network namespace in vsock to reach the following
> > goals:
> > - isolate host applications from guest applications using the same ports
> > with CID_ANY
> > - assign the same CID of VMs running in different network namespaces
> > - partition VMs between VMMs or at finer granularity
> >
> > This new feature is disabled by default, because it changes vsock's
> > behavior with network namespaces and could break existing applications.
> > It can be enabled with the new 'netns' module parameter of vsock.ko.
> >
> > This implementation provides the following behavior:
> > - packets received from the host (received by G2H transports) are
> > assigned to the default netns (init_net)
> > - packets received from the guest (received by H2G - vhost-vsock) are
> > assigned to the netns of the process that opens /dev/vhost-vsock
> > (usually the VMM, qemu in my tests, opens the /dev/vhost-vsock)
> > - for vmci I need some suggestions, because I don't know how to do
> > and test the same in the vmci driver, for now vmci uses the
> > init_net
> > - loopback packets are exchanged only in the same netns
> >
> > I tested the series in this way:
> > l0_host$ qemu-system-x86_64 -m 4G -M accel=kvm -smp 4 \
> > -drive file=/tmp/vsockvm0.img,if=virtio --nographic \
> > -device vhost-vsock-pci,guest-cid=3
> >
> > l1_vm$ echo 1 > /sys/module/vsock/parameters/netns
> >
> > l1_vm$ ip netns add ns1
> > l1_vm$ ip netns add ns2
> > # same CID on different netns
> > l1_vm$ ip netns exec ns1 qemu-system-x86_64 -m 1G -M accel=kvm -smp 2 \
> > -drive file=/tmp/vsockvm1.img,if=virtio --nographic \
> > -device vhost-vsock-pci,guest-cid=4
> > l1_vm$ ip netns exec ns2 qemu-system-x86_64 -m 1G -M accel=kvm -smp 2 \
> > -drive file=/tmp/vsockvm2.img,if=virtio --nographic \
> > -device vhost-vsock-pci,guest-cid=4
> >
> > # all iperf3 listen on CID_ANY and port 5201, but in different netns
> > l1_vm$ ./iperf3 --vsock -s # connection from l0 or guests started
> > # on default netns (init_net)
> > l1_vm$ ip netns exec ns1 ./iperf3 --vsock -s
> > l1_vm$ ip netns exec ns1 ./iperf3 --vsock -s
> >
> > l0_host$ ./iperf3 --vsock -c 3
> > l2_vm1$ ./iperf3 --vsock -c 2
> > l2_vm2$ ./iperf3 --vsock -c 2
> >
> > [1] https://www.spinics.net/lists/netdev/msg575792.html
> >
> > Stefano Garzarella (3):
> > vsock: add network namespace support
> > vsock/virtio_transport_common: handle netns of received packets
> > vhost/vsock: use netns of process that opens the vhost-vsock device
> >
> > drivers/vhost/vsock.c | 29 ++++++++++++-----
> > include/linux/virtio_vsock.h | 2 ++
> > include/net/af_vsock.h | 7 +++--
> > net/vmw_vsock/af_vsock.c | 41 +++++++++++++++++++------
> > net/vmw_vsock/hyperv_transport.c | 5 +--
> > net/vmw_vsock/virtio_transport.c | 2 ++
> > net/vmw_vsock/virtio_transport_common.c | 12 ++++++--
> > net/vmw_vsock/vmci_transport.c | 5 +--
> > 8 files changed, 78 insertions(+), 25 deletions(-)
> >
> > --
> > 2.24.1
> >
On Mon, Apr 27, 2020 at 10:31:57AM -0400, Michael S. Tsirkin wrote:
> On Mon, Apr 27, 2020 at 04:25:18PM +0200, Stefano Garzarella wrote:
> > Hi David, Michael, Stefan,
> > I'm restarting to work on this topic since Kata guys are interested to
> > have that, especially on the guest side.
> >
> > While working on the v2 I had few doubts, and I'd like to have your
> > suggestions:
> >
> > 1. netns assigned to the device inside the guest
> >
> > Currently I assigned this device to 'init_net'. Maybe it is better
> > if we allow the user to decide which netns assign to the device
> > or to disable this new feature to have the same behavior as before
> > (host reachable from any netns).
> > I think we can handle this in the vsock core and not in the single
> > transports.
> >
> > The simplest way that I found, is to add a new
> > IOCTL_VM_SOCKETS_ASSIGN_G2H_NETNS to /dev/vsock to enable the feature
> > and assign the device to the same netns of the process that do the
> > ioctl(), but I'm not sure it is clean enough.
> >
> > Maybe it is better to add new rtnetlink messages, but I'm not sure if
> > it is feasible since we don't have a netdev device.
> >
> > What do you suggest?
>
> Maybe /dev/vsock-netns here too, like in the host?
>
I'm not sure I get it.
In the guest, /dev/vsock is only used to get the CID assigned to the
guest through an ioctl().
In the virtio-vsock case, the guest transport is loaded when it is discovered
on the PCI bus, so we need a way to "move" it to a netns or to specify
which netns should be used when it is probed.
>
> >
> > 2. netns assigned in the host
> >
> > As Michael suggested, I added a new /dev/vhost-vsock-netns to allow
> > userspace application to use this new feature, leaving to
> > /dev/vhost-vsock the previous behavior (guest reachable from any
> > netns).
> >
> > I like this approach, but I had these doubts:
> >
> > - I need to allocate a new minor for that device (e.g.
> > VHOST_VSOCK_NETNS_MINOR) or is there an alternative way that I can
> > use?
>
> Not that I see. I agree it's a bit annoying. I'll think about it a bit.
>
Thanks for that!
An idea that I had, was to add a new ioctl to /dev/vhost-vsock to enable
the netns support, but I'm not sure it is a clean approach.
> > - It is vhost-vsock specific, should we provide something handled in
> > the vsock core, maybe centralizing the CID allocation and adding a
> > new IOCTL or rtnetlink message like for the guest side?
> > (maybe it could be a second step, and for now we can continue with
> > the new device)
> >
Thanks,
Stefano
On 2020/4/27 下午10:25, Stefano Garzarella wrote:
> Hi David, Michael, Stefan,
> I'm restarting to work on this topic since Kata guys are interested to
> have that, especially on the guest side.
>
> While working on the v2 I had few doubts, and I'd like to have your
> suggestions:
>
> 1. netns assigned to the device inside the guest
>
> Currently I assigned this device to 'init_net'. Maybe it is better
> if we allow the user to decide which netns assign to the device
> or to disable this new feature to have the same behavior as before
> (host reachable from any netns).
> I think we can handle this in the vsock core and not in the single
> transports.
>
> The simplest way that I found, is to add a new
> IOCTL_VM_SOCKETS_ASSIGN_G2H_NETNS to /dev/vsock to enable the feature
> and assign the device to the same netns of the process that do the
> ioctl(), but I'm not sure it is clean enough.
>
> Maybe it is better to add new rtnetlink messages, but I'm not sure if
> it is feasible since we don't have a netdev device.
>
> What do you suggest?
As we've discussed, it should be a netdev probably in either guest or
host side. And it would be much simpler if we want do implement
namespace then. No new API is needed.
Thanks
>
>
> 2. netns assigned in the host
>
> As Michael suggested, I added a new /dev/vhost-vsock-netns to allow
> userspace application to use this new feature, leaving to
> /dev/vhost-vsock the previous behavior (guest reachable from any
> netns).
>
> I like this approach, but I had these doubts:
>
> - I need to allocate a new minor for that device (e.g.
> VHOST_VSOCK_NETNS_MINOR) or is there an alternative way that I can
> use?
>
> - It is vhost-vsock specific, should we provide something handled in
> the vsock core, maybe centralizing the CID allocation and adding a
> new IOCTL or rtnetlink message like for the guest side?
> (maybe it could be a second step, and for now we can continue with
> the new device)
>
>
> Thanks for the help,
> Stefano
>
>
> On Thu, Jan 16, 2020 at 06:24:25PM +0100, Stefano Garzarella wrote:
>> RFC -> v1:
>> * added 'netns' module param to vsock.ko to enable the
>> network namespace support (disabled by default)
>> * added 'vsock_net_eq()' to check the "net" assigned to a socket
>> only when 'netns' support is enabled
>>
>> RFC: https://patchwork.ozlabs.org/cover/1202235/
>>
>> Now that we have multi-transport upstream, I started to take a look to
>> support network namespace in vsock.
>>
>> As we partially discussed in the multi-transport proposal [1], it could
>> be nice to support network namespace in vsock to reach the following
>> goals:
>> - isolate host applications from guest applications using the same ports
>> with CID_ANY
>> - assign the same CID of VMs running in different network namespaces
>> - partition VMs between VMMs or at finer granularity
>>
>> This new feature is disabled by default, because it changes vsock's
>> behavior with network namespaces and could break existing applications.
>> It can be enabled with the new 'netns' module parameter of vsock.ko.
>>
>> This implementation provides the following behavior:
>> - packets received from the host (received by G2H transports) are
>> assigned to the default netns (init_net)
>> - packets received from the guest (received by H2G - vhost-vsock) are
>> assigned to the netns of the process that opens /dev/vhost-vsock
>> (usually the VMM, qemu in my tests, opens the /dev/vhost-vsock)
>> - for vmci I need some suggestions, because I don't know how to do
>> and test the same in the vmci driver, for now vmci uses the
>> init_net
>> - loopback packets are exchanged only in the same netns
>>
>> I tested the series in this way:
>> l0_host$ qemu-system-x86_64 -m 4G -M accel=kvm -smp 4 \
>> -drive file=/tmp/vsockvm0.img,if=virtio --nographic \
>> -device vhost-vsock-pci,guest-cid=3
>>
>> l1_vm$ echo 1 > /sys/module/vsock/parameters/netns
>>
>> l1_vm$ ip netns add ns1
>> l1_vm$ ip netns add ns2
>> # same CID on different netns
>> l1_vm$ ip netns exec ns1 qemu-system-x86_64 -m 1G -M accel=kvm -smp 2 \
>> -drive file=/tmp/vsockvm1.img,if=virtio --nographic \
>> -device vhost-vsock-pci,guest-cid=4
>> l1_vm$ ip netns exec ns2 qemu-system-x86_64 -m 1G -M accel=kvm -smp 2 \
>> -drive file=/tmp/vsockvm2.img,if=virtio --nographic \
>> -device vhost-vsock-pci,guest-cid=4
>>
>> # all iperf3 listen on CID_ANY and port 5201, but in different netns
>> l1_vm$ ./iperf3 --vsock -s # connection from l0 or guests started
>> # on default netns (init_net)
>> l1_vm$ ip netns exec ns1 ./iperf3 --vsock -s
>> l1_vm$ ip netns exec ns1 ./iperf3 --vsock -s
>>
>> l0_host$ ./iperf3 --vsock -c 3
>> l2_vm1$ ./iperf3 --vsock -c 2
>> l2_vm2$ ./iperf3 --vsock -c 2
>>
>> [1] https://www.spinics.net/lists/netdev/msg575792.html
>>
>> Stefano Garzarella (3):
>> vsock: add network namespace support
>> vsock/virtio_transport_common: handle netns of received packets
>> vhost/vsock: use netns of process that opens the vhost-vsock device
>>
>> drivers/vhost/vsock.c | 29 ++++++++++++-----
>> include/linux/virtio_vsock.h | 2 ++
>> include/net/af_vsock.h | 7 +++--
>> net/vmw_vsock/af_vsock.c | 41 +++++++++++++++++++------
>> net/vmw_vsock/hyperv_transport.c | 5 +--
>> net/vmw_vsock/virtio_transport.c | 2 ++
>> net/vmw_vsock/virtio_transport_common.c | 12 ++++++--
>> net/vmw_vsock/vmci_transport.c | 5 +--
>> 8 files changed, 78 insertions(+), 25 deletions(-)
>>
>> --
>> 2.24.1
>>
On Tue, Apr 28, 2020 at 04:13:22PM +0800, Jason Wang wrote:
>
> On 2020/4/27 下午10:25, Stefano Garzarella wrote:
> > Hi David, Michael, Stefan,
> > I'm restarting to work on this topic since Kata guys are interested to
> > have that, especially on the guest side.
> >
> > While working on the v2 I had few doubts, and I'd like to have your
> > suggestions:
> >
> > 1. netns assigned to the device inside the guest
> >
> > Currently I assigned this device to 'init_net'. Maybe it is better
> > if we allow the user to decide which netns assign to the device
> > or to disable this new feature to have the same behavior as before
> > (host reachable from any netns).
> > I think we can handle this in the vsock core and not in the single
> > transports.
> >
> > The simplest way that I found, is to add a new
> > IOCTL_VM_SOCKETS_ASSIGN_G2H_NETNS to /dev/vsock to enable the feature
> > and assign the device to the same netns of the process that do the
> > ioctl(), but I'm not sure it is clean enough.
> >
> > Maybe it is better to add new rtnetlink messages, but I'm not sure if
> > it is feasible since we don't have a netdev device.
> >
> > What do you suggest?
>
>
> As we've discussed, it should be a netdev probably in either guest or host
> side. And it would be much simpler if we want do implement namespace then.
> No new API is needed.
>
Thanks Jason!
It would be cool, but I don't have much experience on netdev.
Do you see any particular obstacles?
I'll take a look to understand how to do it, surely in the guest would
be very useful to have the vsock device as a netdev and maybe also in the host.
Stefano
On 2020/4/29 上午12:00, Stefano Garzarella wrote:
> On Tue, Apr 28, 2020 at 04:13:22PM +0800, Jason Wang wrote:
>> On 2020/4/27 下午10:25, Stefano Garzarella wrote:
>>> Hi David, Michael, Stefan,
>>> I'm restarting to work on this topic since Kata guys are interested to
>>> have that, especially on the guest side.
>>>
>>> While working on the v2 I had few doubts, and I'd like to have your
>>> suggestions:
>>>
>>> 1. netns assigned to the device inside the guest
>>>
>>> Currently I assigned this device to 'init_net'. Maybe it is better
>>> if we allow the user to decide which netns assign to the device
>>> or to disable this new feature to have the same behavior as before
>>> (host reachable from any netns).
>>> I think we can handle this in the vsock core and not in the single
>>> transports.
>>>
>>> The simplest way that I found, is to add a new
>>> IOCTL_VM_SOCKETS_ASSIGN_G2H_NETNS to /dev/vsock to enable the feature
>>> and assign the device to the same netns of the process that do the
>>> ioctl(), but I'm not sure it is clean enough.
>>>
>>> Maybe it is better to add new rtnetlink messages, but I'm not sure if
>>> it is feasible since we don't have a netdev device.
>>>
>>> What do you suggest?
>> As we've discussed, it should be a netdev probably in either guest or host
>> side. And it would be much simpler if we want do implement namespace then.
>> No new API is needed.
>>
> Thanks Jason!
>
> It would be cool, but I don't have much experience on netdev.
> Do you see any particular obstacles?
I don't see but if there's we can try to find a solution or ask for
netdev experts for that. I do hear from somebody that is interested in
having netdev in the past.
>
> I'll take a look to understand how to do it, surely in the guest would
> be very useful to have the vsock device as a netdev and maybe also in the host.
Yes, it's worth to have a try then we will have a unified management
interface and we will benefit from it in the future.
Starting form guest is good idea which should be less complicated than host.
Thanks
>
> Stefano
>