Received: by 2002:a25:ad19:0:0:0:0:0 with SMTP id y25csp3903388ybi; Mon, 29 Jul 2019 15:02:02 -0700 (PDT) X-Google-Smtp-Source: APXvYqzar8wNIO6mCvmGkHtSwaZlBiOkO7qlQAPglZG/VHuNQYL6ccEqPuekKoue/8Zd1XaqDram X-Received: by 2002:a63:2784:: with SMTP id n126mr742676pgn.92.1564437722354; Mon, 29 Jul 2019 15:02:02 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1564437722; cv=none; d=google.com; s=arc-20160816; b=w4Wgx2Nr+mth7GQO763++f6ZLZTZi5lsspkDDrG5GWzCv3Z8TvivMIOhNFhxqiPToK E9n3oJLWAktihi2lKMyawK43BvRiSo7pFIScDpwAwMtZCri56x1erlcnglGlwMeBjzmh biE0JkSG8hTVs4QMl8/wBHNBVwYdgFr4oD9QyqFq8ADUQ3w+JLZYLDseH0E1qUufLzCn 9w0J5IrjV0quMNmUcisOMizfPUd5mg3wquiJSunQ2/8vaZxBzAVc+Mq+l/uVadEpfs6x 8Wypmk5gqoYQA1Vzv9MovUsCl8tNv+8WvldCe4jxvdNQ2dKnQkiMRAuh7Cgq22djAiCF SWHQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=okCdMq9mPshN0DGM4+W0kc8Ipb0i9EdacMckArHhxzo=; b=brQH3nDl+6lX0K9tdVGpuYJ3lp8qHjGqCQnbma4RYZA72k769gaeTcxBebMZHz3jDr nHWgjSp+ygkdzFCbFyUaf9N+3ju4JiuGI1F64D4Inr9lqAbYwye/m8ZkisvpN01yXptj 9Q4u1PhdRoL0fe1GKMloRj/lp/XdH+WGsmEPBY2JQmkcgnZr0h6IyPMlaUp+syAUxVw6 ZRsDOrQRXNsMC/awjUaKqC9zPVYbFqclNcKnm4WQDTZUCfRirMAl4ojI2F/rZn/BhWly Aalym5bMf98FDW5BEJ6GMs+h7CYspaVrihnTRzcNQ7Za7Hk/P27HQa216vyK6y605rrR 52pw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 1si24846970plz.129.2019.07.29.15.01.46; Mon, 29 Jul 2019 15:02:02 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729222AbfG2TKY (ORCPT + 99 others); Mon, 29 Jul 2019 15:10:24 -0400 Received: from mail-vk1-f196.google.com ([209.85.221.196]:45769 "EHLO mail-vk1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727987AbfG2TKX (ORCPT ); Mon, 29 Jul 2019 15:10:23 -0400 Received: by mail-vk1-f196.google.com with SMTP id e83so12238586vke.12 for ; Mon, 29 Jul 2019 12:10:22 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=okCdMq9mPshN0DGM4+W0kc8Ipb0i9EdacMckArHhxzo=; b=PTCxYPTsFaK9a75htoA66EHF9zv/ZJXbNrED2fl7HZGQoumjlOEycQkVgmnXhTPIKy l1/7onPrwdyB5HmjuPneafwTZUJoMps/DswwcyJnXASaAuoIIYpVwmd45/E6oJNDv+VU muD9XbuDazLIOm6S9QPz8+5Jt2bAp+/+/cl9O3FELpSC1Exb9DdMgDoZCanB8w8rX0vM T9tCEP36bWW1/UiDDijRyEX19a591nfq2kv5IGibnOYwMNdfv1wNdeOtH8biZbvsxmd8 +JCdY1LDFtkZvZxh8VypY5bjyjTz2Aqp9wF285rRJo7xhv1/NLu/ufAyziJJ2aDwpBay IFaA== X-Gm-Message-State: APjAAAVIh4swgyo5cOedEsSJHR14uKNnyWblwDSKNzpFvfnghO4xIkWA eS2jQwUpHrSbYq7KMmbAC2Hb5A== X-Received: by 2002:a1f:728b:: with SMTP id n133mr42422085vkc.84.1564427422314; Mon, 29 Jul 2019 12:10:22 -0700 (PDT) Received: from redhat.com (bzq-79-181-91-42.red.bezeqint.net. [79.181.91.42]) by smtp.gmail.com with ESMTPSA id i137sm29091929vkd.24.2019.07.29.12.10.18 (version=TLS1_3 cipher=AEAD-AES256-GCM-SHA384 bits=256/256); Mon, 29 Jul 2019 12:10:21 -0700 (PDT) Date: Mon, 29 Jul 2019 15:10:15 -0400 From: "Michael S. Tsirkin" To: Stefano Garzarella Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org, Stefan Hajnoczi , "David S. Miller" , virtualization@lists.linux-foundation.org, Jason Wang , kvm@vger.kernel.org Subject: Re: [PATCH v4 1/5] vsock/virtio: limit the memory used per-socket Message-ID: <20190729143622-mutt-send-email-mst@kernel.org> References: <20190717113030.163499-1-sgarzare@redhat.com> <20190717113030.163499-2-sgarzare@redhat.com> <20190729095956-mutt-send-email-mst@kernel.org> <20190729153656.zk4q4rob5oi6iq7l@steredhat> <20190729114302-mutt-send-email-mst@kernel.org> <20190729161903.yhaj5rfcvleexkhc@steredhat> <20190729165056.r32uzj6om3o6vfvp@steredhat> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190729165056.r32uzj6om3o6vfvp@steredhat> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Jul 29, 2019 at 06:50:56PM +0200, Stefano Garzarella wrote: > On Mon, Jul 29, 2019 at 06:19:03PM +0200, Stefano Garzarella wrote: > > On Mon, Jul 29, 2019 at 11:49:02AM -0400, Michael S. Tsirkin wrote: > > > On Mon, Jul 29, 2019 at 05:36:56PM +0200, Stefano Garzarella wrote: > > > > On Mon, Jul 29, 2019 at 10:04:29AM -0400, Michael S. Tsirkin wrote: > > > > > On Wed, Jul 17, 2019 at 01:30:26PM +0200, Stefano Garzarella wrote: > > > > > > Since virtio-vsock was introduced, the buffers filled by the host > > > > > > and pushed to the guest using the vring, are directly queued in > > > > > > a per-socket list. These buffers are preallocated by the guest > > > > > > with a fixed size (4 KB). > > > > > > > > > > > > The maximum amount of memory used by each socket should be > > > > > > controlled by the credit mechanism. > > > > > > The default credit available per-socket is 256 KB, but if we use > > > > > > only 1 byte per packet, the guest can queue up to 262144 of 4 KB > > > > > > buffers, using up to 1 GB of memory per-socket. In addition, the > > > > > > guest will continue to fill the vring with new 4 KB free buffers > > > > > > to avoid starvation of other sockets. > > > > > > > > > > > > This patch mitigates this issue copying the payload of small > > > > > > packets (< 128 bytes) into the buffer of last packet queued, in > > > > > > order to avoid wasting memory. > > > > > > > > > > > > Reviewed-by: Stefan Hajnoczi > > > > > > Signed-off-by: Stefano Garzarella > > > > > > > > > > This is good enough for net-next, but for net I think we > > > > > should figure out how to address the issue completely. > > > > > Can we make the accounting precise? What happens to > > > > > performance if we do? > > > > > > > > > > > > > In order to do more precise accounting maybe we can use the buffer size, > > > > instead of payload size when we update the credit available. > > > > In this way, the credit available for each socket will reflect the memory > > > > actually used. > > > > > > > > I should check better, because I'm not sure what happen if the peer sees > > > > 1KB of space available, then it sends 1KB of payload (using a 4KB > > > > buffer). > > > > > > > > The other option is to copy each packet in a new buffer like I did in > > > > the v2 [2], but this forces us to make a copy for each packet that does > > > > not fill the entire buffer, perhaps too expensive. > > > > > > > > [2] https://patchwork.kernel.org/patch/10938741/ > > > > > > > > > > > > Thanks, > > > > Stefano > > > > > > Interesting. You are right, and at some level the protocol forces copies. > > > > > > We could try to detect that the actual memory is getting close to > > > admin limits and force copies on queued packets after the fact. > > > Is that practical? > > > > Yes, I think it is doable! > > We can decrease the credit available with the buffer size queued, and > > when the buffer size of packet to queue is bigger than the credit > > available, we can copy it. > > > > > > > > And yes we can extend the credit accounting to include buffer size. > > > That's a protocol change but maybe it makes sense. > > > > Since we send to the other peer the credit available, maybe this > > change can be backwards compatible (I'll check better this). > > What I said was wrong. > > We send a counter (increased when the user consumes the packets) and the > "buf_alloc" (the max memory allowed) to the other peer. > It makes a difference between a local counter (increased when the > packets are sent) and the remote counter to calculate the credit available: > > u32 virtio_transport_get_credit(struct virtio_vsock_sock *vvs, u32 credit) > { > u32 ret; > > spin_lock_bh(&vvs->tx_lock); > ret = vvs->peer_buf_alloc - (vvs->tx_cnt - vvs->peer_fwd_cnt); > if (ret > credit) > ret = credit; > vvs->tx_cnt += ret; > spin_unlock_bh(&vvs->tx_lock); > > return ret; > } > > Maybe I can play with "buf_alloc" to take care of bytes queued but not > used. > > Thanks, > Stefano Right. And the idea behind it all was that if we send a credit to remote then we have space for it. I think the basic idea was that if we have actual allocated memory and can copy data there, then we send the credit to remote. Of course that means an extra copy every packet. So as an optimization, it seems that we just assume that we will be able to allocate a new buffer. First this is not the best we can do. We can actually do allocate memory in the socket before sending credit. If packet is small then we copy it there. If packet is big then we queue the packet, take the buffer out of socket and add it to the virtqueue. Second question is what to do about medium sized packets. Packet is 1K but buffer is 4K, what do we do? And here I wonder - why don't we add the 3K buffer to the vq? -- MST