Received: by 2002:a25:4158:0:0:0:0:0 with SMTP id o85csp5532611yba; Mon, 13 May 2019 12:32:08 -0700 (PDT) X-Google-Smtp-Source: APXvYqxUqq5y+WAz7+sruiY3VkvShNtm1tyIcc+RJHYFW7itemNqtATJQCFPZOl/acC3SIP5wKHY X-Received: by 2002:a17:902:e48d:: with SMTP id cj13mr33388810plb.156.1557775928535; Mon, 13 May 2019 12:32:08 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1557775928; cv=none; d=google.com; s=arc-20160816; b=zY83RRry0yWsKIpQNKJk6MKtEXgtLm4YrhI74CyimbPi6oLTKJU/UucFQLnEETUNYH tS87aLRUr9InjA3vQlFMVLl6yumEJgdLqYC+shKXT3hH9dNUfNY3dZqgEq4NnS2JrDb/ +u7s3IOW5bPSO9xO6Bz5/aF1iU8rTcsabGcrKNHF+ehtZhF2hgg9WdO+1EgTKXLtLJvX vU71zm9dg7g+Zk8fpJvqaV85OpwBXFKnfRGw08bZO8B8vRL20Rw4qzSNUXJOWJgSkxeO sGnBnSwUXzZ0sNcTUHMm2dWmkHg0wADgc7Z/w3nQOrh/W/u8it4Bimbe/SdsSAC5oV4b qXTQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-transfer-encoding:content-disposition:mime-version :references:message-id:subject:cc:to:from:date; bh=aOc6CBKw+RVeeMdGwIxa6uCL3lFsOT0Lwgti+Fi6e58=; b=V0yP8kLU8zoxFkv/BI3RP2VgR4HyUeYquROSwHpxhLKh2IM7JrGPS02VD5uZ5BCKDb hvcVieAC/XL0UvzMlm7whgWY4PnLAJtNxhxK1IobdW+zgZ25mbheYlPrG2l0n6vUGCJ/ lx9gRu0lNjyHcrfA5T1p5asE8J7Y959dH4xIJCzRpULjGpUDyCUvOrVPJPgfKOwyqs4L PkPaNMWFtcUGwqB1kKjIB0+Tdglnbli/S9jfLkmvwshNMdo1Eh8CoGVZWDwHP08IFJN7 BY2SWUxS8ksMFI78J10TZeEIk/vzGIWaxghYZgetH0XnA2Vm4zzpLMH1+Cc6p5SWPbJz k91g== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id ay3si12506776plb.298.2019.05.13.12.31.52; Mon, 13 May 2019 12:32:08 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731631AbfEMRX2 (ORCPT + 99 others); Mon, 13 May 2019 13:23:28 -0400 Received: from mail-wr1-f65.google.com ([209.85.221.65]:46665 "EHLO mail-wr1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725946AbfEMRX1 (ORCPT ); Mon, 13 May 2019 13:23:27 -0400 Received: by mail-wr1-f65.google.com with SMTP id r7so15560019wrr.13 for ; Mon, 13 May 2019 10:23:25 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:content-transfer-encoding :in-reply-to:user-agent; bh=aOc6CBKw+RVeeMdGwIxa6uCL3lFsOT0Lwgti+Fi6e58=; b=NXRHYIZOQWsrK1gHisuvsc4MBV1bHI5nfSP9MGxI1E53Mp6oWvOcViIhp9warJLCIP VAvVDczXSlvkYUhN8kIcOLXRyrJBOD/o4G4HhKTfMLAGQn/OcYnWLEgGeHFzwxsaDMJN losM/EOEcaNHWqWbDZUVHsXDabNBHiciJGBusMbTQfRsuzlVOjbndzAQOWZAwg85iVjb X72sdm5BtbAPut7AWnvAJSw7KDkMBkDd//7ASIsloijTjLKOFrpVveWYqJHd2Y4F4ZIB S9mo1Ve4on2H1ArHDW4th4ln52Sq+QWiHTklJZ61B1k5lcLCBYj5ncjy1r6GTN+R+noQ Kdjg== X-Gm-Message-State: APjAAAWCLj0pvU9RheSJOtXvix2WmHLnONeEe5MT3KnqPgmm6Yurrmwk G91SHx3RenwkiWGZYikjzwxLzw== X-Received: by 2002:adf:b35e:: with SMTP id k30mr2281815wrd.178.1557768205296; Mon, 13 May 2019 10:23:25 -0700 (PDT) Received: from steredhat (host151-251-static.12-87-b.business.telecomitalia.it. [87.12.251.151]) by smtp.gmail.com with ESMTPSA id s7sm13859054wrn.84.2019.05.13.10.23.24 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Mon, 13 May 2019 10:23:24 -0700 (PDT) Date: Mon, 13 May 2019 19:23:22 +0200 From: Stefano Garzarella To: Jason Wang Cc: netdev@vger.kernel.org, "David S. Miller" , "Michael S. Tsirkin" , virtualization@lists.linux-foundation.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, Stefan Hajnoczi Subject: Re: [PATCH v2 1/8] vsock/virtio: limit the memory used per-socket Message-ID: <20190513172322.vcgenx7xk4v6r2ay@steredhat> References: <20190510125843.95587-1-sgarzare@redhat.com> <20190510125843.95587-2-sgarzare@redhat.com> <3b275b52-63d9-d260-1652-8e8bf7dd679f@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <3b275b52-63d9-d260-1652-8e8bf7dd679f@redhat.com> User-Agent: NeoMutt/20180716 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, May 13, 2019 at 05:58:53PM +0800, Jason Wang wrote: > > On 2019/5/10 下午8:58, Stefano Garzarella wrote: > > Since virtio-vsock was introduced, the buffers filled by the host > > and pushed to the guest using the vring, are directly queued in > > a per-socket list avoiding to copy it. > > These buffers are preallocated by the guest with a fixed > > size (4 KB). > > > > The maximum amount of memory used by each socket should be > > controlled by the credit mechanism. > > The default credit available per-socket is 256 KB, but if we use > > only 1 byte per packet, the guest can queue up to 262144 of 4 KB > > buffers, using up to 1 GB of memory per-socket. In addition, the > > guest will continue to fill the vring with new 4 KB free buffers > > to avoid starvation of other sockets. > > > > This patch solves this issue copying the payload in a new buffer. > > Then it is queued in the per-socket list, and the 4KB buffer used > > by the host is freed. > > > > In this way, the memory used by each socket respects the credit > > available, and we still avoid starvation, paying the cost of an > > extra memory copy. When the buffer is completely full we do a > > "zero-copy", moving the buffer directly in the per-socket list. > > > I wonder in the long run we should use generic socket accouting mechanism > provided by kernel (e.g socket, skb, sndbuf, recvbug, truesize) instead of > vsock specific thing to avoid duplicating efforts. I agree, the idea is to switch to sk_buff but this should require an huge change. If we will use the virtio-net datapath, it will become simpler. > > > > > > Signed-off-by: Stefano Garzarella > > --- > > drivers/vhost/vsock.c | 2 + > > include/linux/virtio_vsock.h | 8 +++ > > net/vmw_vsock/virtio_transport.c | 1 + > > net/vmw_vsock/virtio_transport_common.c | 95 ++++++++++++++++++------- > > 4 files changed, 81 insertions(+), 25 deletions(-) > > > > diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c > > index bb5fc0e9fbc2..7964e2daee09 100644 > > --- a/drivers/vhost/vsock.c > > +++ b/drivers/vhost/vsock.c > > @@ -320,6 +320,8 @@ vhost_vsock_alloc_pkt(struct vhost_virtqueue *vq, > > return NULL; > > } > > + pkt->buf_len = pkt->len; > > + > > nbytes = copy_from_iter(pkt->buf, pkt->len, &iov_iter); > > if (nbytes != pkt->len) { > > vq_err(vq, "Expected %u byte payload, got %zu bytes\n", > > diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h > > index e223e2632edd..345f04ee9193 100644 > > --- a/include/linux/virtio_vsock.h > > +++ b/include/linux/virtio_vsock.h > > @@ -54,9 +54,17 @@ struct virtio_vsock_pkt { > > void *buf; > > u32 len; > > u32 off; > > + u32 buf_len; > > bool reply; > > }; > > +struct virtio_vsock_buf { > > + struct list_head list; > > + void *addr; > > + u32 len; > > + u32 off; > > +}; > > + > > struct virtio_vsock_pkt_info { > > u32 remote_cid, remote_port; > > struct vsock_sock *vsk; > > diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c > > index 15eb5d3d4750..af1d2ce12f54 100644 > > --- a/net/vmw_vsock/virtio_transport.c > > +++ b/net/vmw_vsock/virtio_transport.c > > @@ -280,6 +280,7 @@ static void virtio_vsock_rx_fill(struct virtio_vsock *vsock) > > break; > > } > > + pkt->buf_len = buf_len; > > pkt->len = buf_len; > > sg_init_one(&hdr, &pkt->hdr, sizeof(pkt->hdr)); > > diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c > > index 602715fc9a75..0248d6808755 100644 > > --- a/net/vmw_vsock/virtio_transport_common.c > > +++ b/net/vmw_vsock/virtio_transport_common.c > > @@ -65,6 +65,9 @@ virtio_transport_alloc_pkt(struct virtio_vsock_pkt_info *info, > > pkt->buf = kmalloc(len, GFP_KERNEL); > > if (!pkt->buf) > > goto out_pkt; > > + > > + pkt->buf_len = len; > > + > > err = memcpy_from_msg(pkt->buf, info->msg, len); > > if (err) > > goto out; > > @@ -86,6 +89,46 @@ virtio_transport_alloc_pkt(struct virtio_vsock_pkt_info *info, > > return NULL; > > } > > +static struct virtio_vsock_buf * > > +virtio_transport_alloc_buf(struct virtio_vsock_pkt *pkt, bool zero_copy) > > +{ > > + struct virtio_vsock_buf *buf; > > + > > + if (pkt->len == 0) > > + return NULL; > > + > > + buf = kzalloc(sizeof(*buf), GFP_KERNEL); > > + if (!buf) > > + return NULL; > > + > > + /* If the buffer in the virtio_vsock_pkt is full, we can move it to > > + * the new virtio_vsock_buf avoiding the copy, because we are sure that > > + * we are not use more memory than that counted by the credit mechanism. > > + */ > > + if (zero_copy && pkt->len == pkt->buf_len) { > > + buf->addr = pkt->buf; > > + pkt->buf = NULL; > > + } else { > > > Is the copy still needed if we're just few bytes less? We meet similar issue > for virito-net, and virtio-net solve this by always copy first 128bytes for > big packets. > > See receive_big() I'm seeing, It is more sophisticated. IIUC, virtio-net allocates a sk_buff with 128 bytes of buffer, then copies the first 128 bytes, then adds the buffer used to receive the packet as a frag to the skb. Do you suggest to implement something similar, or for now we can use my approach and if we will merge the datapath we can reuse the virtio-net approach? Thanks, Stefano