Received: by 2002:a25:ad19:0:0:0:0:0 with SMTP id y25csp3646137ybi; Mon, 29 Jul 2019 10:03:42 -0700 (PDT) X-Google-Smtp-Source: APXvYqz6dTl/J/B3PUnOixDq/ljNBvPeuvO1Yh14kOtZ62gvz21pciUw3PiaZ93Yhzcqw0ifmSZR X-Received: by 2002:a17:902:20b:: with SMTP id 11mr112557245plc.78.1564419822512; Mon, 29 Jul 2019 10:03:42 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1564419822; cv=none; d=google.com; s=arc-20160816; b=lx4QyMM7ydggHms6KccdXKJ6WzKj5DJ3hXDnb9GdOmLv5EL7+Yss9qwAP1SuCM6r5f xm7NDQN3o3PMIbY1+wZkwjcpW8LB7FVXvDTwBSKRljaub9nUNxH7NQ8+EglecpymaWNP JyaiOfmgqRBVxKONjIPliK8/vZnvNY3NIcMkTpxSoK8XsarWDwVy0G2uV8lOg89y+nQN oTGbH/8MuqCcj2NefW3NG1CjXANKqS3fU0P3963swS7lQUX9ry/6NvHO/LKoJxT4TPtz rQw/Vao3UFDYBapLXukSu9U40RD/PwgNDpRaeYWw0Lp7y7Xjg7pML4X6PBxiH/jcmNNG gkAQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=HxGQ2fS0M6Vg7g7VHJrZaUAKmqWrcLECVicIPFYKW7M=; b=r+bHuVB6K6m0+mLK2tGXgZmOEt+WiwVEhFZKd8XAGcXccsAvGFcS3Bt4CxU+cwOJpX 7hdxzzSipONVG1ZI8MG5rVxgFvoTSJNodLioCDlG3CYmTmYUI3uoPceZ6axX1W2X0E1I /Zbg0fcUSxp0V0M+3PlTiuKlEeE5kpCX7hZ1T+XYquniKVoqLmNDAhen6xRx7B1JFYsb r/IaVXkJ7Mw4QZ/FQr8CGK0Ea6hqinhNny6BQtkMJ/BtldLrMgzo+TFau7GPDMqwcKwm XVyPTU74HMUxjyGZ/3I+jyMoijNZw8E7JA7RRdfILB6QySc0iXf2HNS5+/EVGYjs3jDP +Z+g== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id h8si23504270plt.16.2019.07.29.10.03.26; Mon, 29 Jul 2019 10:03:42 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728541AbfG2QvB (ORCPT + 99 others); Mon, 29 Jul 2019 12:51:01 -0400 Received: from mail-wm1-f66.google.com ([209.85.128.66]:50666 "EHLO mail-wm1-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728548AbfG2QvA (ORCPT ); Mon, 29 Jul 2019 12:51:00 -0400 Received: by mail-wm1-f66.google.com with SMTP id v15so54522177wml.0 for ; Mon, 29 Jul 2019 09:50:59 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=HxGQ2fS0M6Vg7g7VHJrZaUAKmqWrcLECVicIPFYKW7M=; b=c5l9RgVk4ZKAsE2SkyhgR2oOLiQtZbJresjpFWsh7q9167irNw6Z9nNwldJLlr2tRQ /IbBzUyu1KmpZryngzCX3+PY8eRPSxGS7ckF/IwdDBOnajQoWDuSteO3owJeCw6iySez GmXlir2hylz7AIKklvcSxAO0nSpiTzDtb0seX7hZkPygi+Z376KtO5+1yXoHsvizUsGr uSqUBwmRTPZ3WjnkJURHMxQlG92xz57mpG3vNO6/tf7UUIkLM8Cn5WbA3wwvPeu8dN1I BZAY3HT7eQ/NgXkkEL6Swm3mocHH7T6dnd2OtUdbi0xTEZUPeAmRXlkGXOU8c/MbNcY6 RikA== X-Gm-Message-State: APjAAAVrXzayotJ9Bwgx/DzRfhEDys5/x6OkVI5TGRZ4ny/lHkLZ89xs OrtFZ15ryI6X6g+pP52X+Th3Yw== X-Received: by 2002:a1c:7d08:: with SMTP id y8mr82606423wmc.50.1564419059190; Mon, 29 Jul 2019 09:50:59 -0700 (PDT) Received: from steredhat (host122-201-dynamic.13-79-r.retail.telecomitalia.it. [79.13.201.122]) by smtp.gmail.com with ESMTPSA id s10sm46971809wrt.49.2019.07.29.09.50.58 (version=TLS1_3 cipher=AEAD-AES256-GCM-SHA384 bits=256/256); Mon, 29 Jul 2019 09:50:58 -0700 (PDT) Date: Mon, 29 Jul 2019 18:50:56 +0200 From: Stefano Garzarella To: "Michael S. Tsirkin" Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org, Stefan Hajnoczi , "David S. Miller" , virtualization@lists.linux-foundation.org, Jason Wang , kvm@vger.kernel.org Subject: Re: [PATCH v4 1/5] vsock/virtio: limit the memory used per-socket Message-ID: <20190729165056.r32uzj6om3o6vfvp@steredhat> References: <20190717113030.163499-1-sgarzare@redhat.com> <20190717113030.163499-2-sgarzare@redhat.com> <20190729095956-mutt-send-email-mst@kernel.org> <20190729153656.zk4q4rob5oi6iq7l@steredhat> <20190729114302-mutt-send-email-mst@kernel.org> <20190729161903.yhaj5rfcvleexkhc@steredhat> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190729161903.yhaj5rfcvleexkhc@steredhat> User-Agent: NeoMutt/20180716 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Jul 29, 2019 at 06:19:03PM +0200, Stefano Garzarella wrote: > On Mon, Jul 29, 2019 at 11:49:02AM -0400, Michael S. Tsirkin wrote: > > On Mon, Jul 29, 2019 at 05:36:56PM +0200, Stefano Garzarella wrote: > > > On Mon, Jul 29, 2019 at 10:04:29AM -0400, Michael S. Tsirkin wrote: > > > > On Wed, Jul 17, 2019 at 01:30:26PM +0200, Stefano Garzarella wrote: > > > > > Since virtio-vsock was introduced, the buffers filled by the host > > > > > and pushed to the guest using the vring, are directly queued in > > > > > a per-socket list. These buffers are preallocated by the guest > > > > > with a fixed size (4 KB). > > > > > > > > > > The maximum amount of memory used by each socket should be > > > > > controlled by the credit mechanism. > > > > > The default credit available per-socket is 256 KB, but if we use > > > > > only 1 byte per packet, the guest can queue up to 262144 of 4 KB > > > > > buffers, using up to 1 GB of memory per-socket. In addition, the > > > > > guest will continue to fill the vring with new 4 KB free buffers > > > > > to avoid starvation of other sockets. > > > > > > > > > > This patch mitigates this issue copying the payload of small > > > > > packets (< 128 bytes) into the buffer of last packet queued, in > > > > > order to avoid wasting memory. > > > > > > > > > > Reviewed-by: Stefan Hajnoczi > > > > > Signed-off-by: Stefano Garzarella > > > > > > > > This is good enough for net-next, but for net I think we > > > > should figure out how to address the issue completely. > > > > Can we make the accounting precise? What happens to > > > > performance if we do? > > > > > > > > > > In order to do more precise accounting maybe we can use the buffer size, > > > instead of payload size when we update the credit available. > > > In this way, the credit available for each socket will reflect the memory > > > actually used. > > > > > > I should check better, because I'm not sure what happen if the peer sees > > > 1KB of space available, then it sends 1KB of payload (using a 4KB > > > buffer). > > > > > > The other option is to copy each packet in a new buffer like I did in > > > the v2 [2], but this forces us to make a copy for each packet that does > > > not fill the entire buffer, perhaps too expensive. > > > > > > [2] https://patchwork.kernel.org/patch/10938741/ > > > > > > > > > Thanks, > > > Stefano > > > > Interesting. You are right, and at some level the protocol forces copies. > > > > We could try to detect that the actual memory is getting close to > > admin limits and force copies on queued packets after the fact. > > Is that practical? > > Yes, I think it is doable! > We can decrease the credit available with the buffer size queued, and > when the buffer size of packet to queue is bigger than the credit > available, we can copy it. > > > > > And yes we can extend the credit accounting to include buffer size. > > That's a protocol change but maybe it makes sense. > > Since we send to the other peer the credit available, maybe this > change can be backwards compatible (I'll check better this). What I said was wrong. We send a counter (increased when the user consumes the packets) and the "buf_alloc" (the max memory allowed) to the other peer. It makes a difference between a local counter (increased when the packets are sent) and the remote counter to calculate the credit available: u32 virtio_transport_get_credit(struct virtio_vsock_sock *vvs, u32 credit) { u32 ret; spin_lock_bh(&vvs->tx_lock); ret = vvs->peer_buf_alloc - (vvs->tx_cnt - vvs->peer_fwd_cnt); if (ret > credit) ret = credit; vvs->tx_cnt += ret; spin_unlock_bh(&vvs->tx_lock); return ret; } Maybe I can play with "buf_alloc" to take care of bytes queued but not used. Thanks, Stefano