Received: by 2002:a25:ad19:0:0:0:0:0 with SMTP id y25csp3353568ybi; Fri, 19 Jul 2019 01:52:25 -0700 (PDT) X-Google-Smtp-Source: APXvYqyPLNZQmxgq6pjiBNZLgaiM4Mia4TKPrJ0PcWtSQfzDl+0RGdan+cK5MQeB/23psrD1KeNo X-Received: by 2002:a63:e62:: with SMTP id 34mr52206450pgo.331.1563526345035; Fri, 19 Jul 2019 01:52:25 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1563526345; cv=none; d=google.com; s=arc-20160816; b=PR5f8BBzZiXI7ZnS08ynfkiCj4JTxNT/6bKbQxB0bjX6GLE9iYOSvej9WOT4aDYfwz W5PczRmojnASbVznUzrLnYprlq39EE2E53fo8OF1IU2jkvT8qRytESuuu/Qety+TASOv A/M4s75kVUOZ+3Jz+/symb84K126n3DTp34qagM3ffx4ls5nfQ+0mWRVrY6ejkdPdjzm xTy+R/4AEdiM3pr5Z550l/sv0EtZmjYtBEvco9MR+yLVEIHr187gna4uoIiO0OuB3xIH 3usYNC7KQLuM+V0s8b2nROnEkMbu6hd5Kw8JlxsQVMMSerF8mJpgdUAT39WURnZTU0A4 hByg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-language :content-transfer-encoding:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject; bh=rZecnrrcX7ItB7IlfQiF8CE4rxS/ZqreGwtDQuQ1nYs=; b=r6I8/7voaTn+wjmi0R2QY9HNVQMVKC6GYu5zWLn48+xe8jfzOh78S1V69h0WqRekap qstSE+hRQ0nViP9qDIuWHWefamQAAD+XFMmwZ6lN2BBPONftpWHtdZJc/XbekLSwdmP2 pjyqFyPS+4YCrTJ+dyV6K7O6DX6k4Hj3r8UarP0U6Bw5m94gi6pE+gPkA2a3liGoXVnp HOhMxdlCWKEW2ylRnwmvaC+68NDwXNiDAhPowmb+nPzNuvT9nN2x+F3iaSV6PsTgC0dc jhLmb0sUk8RciRhnwEy/2OXmyUqvVmnZNfyuef4Ah0da7jIsjmPCNu8pdzMiJ1d7BU2f CBIA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id z186si1666617pgd.162.2019.07.19.01.52.09; Fri, 19 Jul 2019 01:52:25 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727465AbfGSIvS (ORCPT + 99 others); Fri, 19 Jul 2019 04:51:18 -0400 Received: from mx1.redhat.com ([209.132.183.28]:43532 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725794AbfGSIvR (ORCPT ); Fri, 19 Jul 2019 04:51:17 -0400 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id BEFDB2EED01; Fri, 19 Jul 2019 08:51:16 +0000 (UTC) Received: from [10.72.12.179] (ovpn-12-179.pek2.redhat.com [10.72.12.179]) by smtp.corp.redhat.com (Postfix) with ESMTP id 29D6C5C57A; Fri, 19 Jul 2019 08:51:01 +0000 (UTC) Subject: Re: [PATCH v4 4/5] vhost/vsock: split packets to send using multiple buffers To: Stefano Garzarella Cc: "Michael S. Tsirkin" , netdev@vger.kernel.org, linux-kernel@vger.kernel.org, Stefan Hajnoczi , "David S. Miller" , virtualization@lists.linux-foundation.org, kvm@vger.kernel.org References: <20190717113030.163499-1-sgarzare@redhat.com> <20190717113030.163499-5-sgarzare@redhat.com> <20190717105336-mutt-send-email-mst@kernel.org> <20190718041234-mutt-send-email-mst@kernel.org> <20190718072741-mutt-send-email-mst@kernel.org> <20190719080832.7hoeus23zjyrx3cc@steredhat> <20190719083920.67qo2umpthz454be@steredhat> From: Jason Wang Message-ID: <53da84b9-184f-1377-0582-ab7cf42ebdb6@redhat.com> Date: Fri, 19 Jul 2019 16:51:00 +0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.8.0 MIME-Version: 1.0 In-Reply-To: <20190719083920.67qo2umpthz454be@steredhat> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-US X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.29]); Fri, 19 Jul 2019 08:51:16 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2019/7/19 下午4:39, Stefano Garzarella wrote: > On Fri, Jul 19, 2019 at 04:21:52PM +0800, Jason Wang wrote: >> On 2019/7/19 下午4:08, Stefano Garzarella wrote: >>> On Thu, Jul 18, 2019 at 07:35:46AM -0400, Michael S. Tsirkin wrote: >>>> On Thu, Jul 18, 2019 at 11:37:30AM +0200, Stefano Garzarella wrote: >>>>> On Thu, Jul 18, 2019 at 10:13 AM Michael S. Tsirkin wrote: >>>>>> On Thu, Jul 18, 2019 at 09:50:14AM +0200, Stefano Garzarella wrote: >>>>>>> On Wed, Jul 17, 2019 at 4:55 PM Michael S. Tsirkin wrote: >>>>>>>> On Wed, Jul 17, 2019 at 01:30:29PM +0200, Stefano Garzarella wrote: >>>>>>>>> If the packets to sent to the guest are bigger than the buffer >>>>>>>>> available, we can split them, using multiple buffers and fixing >>>>>>>>> the length in the packet header. >>>>>>>>> This is safe since virtio-vsock supports only stream sockets. >>>>>>>>> >>>>>>>>> Signed-off-by: Stefano Garzarella >>>>>>>> So how does it work right now? If an app >>>>>>>> does sendmsg with a 64K buffer and the other >>>>>>>> side publishes 4K buffers - does it just stall? >>>>>>> Before this series, the 64K (or bigger) user messages was split in 4K packets >>>>>>> (fixed in the code) and queued in an internal list for the TX worker. >>>>>>> >>>>>>> After this series, we will queue up to 64K packets and then it will be split in >>>>>>> the TX worker, depending on the size of the buffers available in the >>>>>>> vring. (The idea was to allow EWMA or a configuration of the buffers size, but >>>>>>> for now we postponed it) >>>>>> Got it. Using workers for xmit is IMHO a bad idea btw. >>>>>> Why is it done like this? >>>>> Honestly, I don't know the exact reasons for this design, but I suppose >>>>> that the idea was to have only one worker that uses the vring, and >>>>> multiple user threads that enqueue packets in the list. >>>>> This can simplify the code and we can put the user threads to sleep if >>>>> we don't have "credit" available (this means that the receiver doesn't >>>>> have space to receive the packet). >>>> I think you mean the reverse: even without credits you can copy from >>>> user and queue up data, then process it without waking up the user >>>> thread. >>> I checked the code better, but it doesn't seem to do that. >>> The .sendmsg callback of af_vsock, check if the transport has space >>> (virtio-vsock transport returns the credit available). If there is no >>> space, it put the thread to sleep on the 'sk_sleep(sk)' wait_queue. >>> >>> When the transport receives an update of credit available on the other >>> peer, it calls 'sk->sk_write_space(sk)' that wakes up the thread >>> sleeping, that will queue the new packet. >>> >>> So, in the current implementation, the TX worker doesn't check the >>> credit available, it only sends the packets. >>> >>>> Does it help though? It certainly adds up work outside of >>>> user thread context which means it's not accounted for >>>> correctly. >>> I can try to xmit the packet directly in the user thread context, to see >>> the improvements. >> >> It will then looks more like what virtio-net (and other networking device) >> did. > I'll try ASAP, the changes should not be too complicated... I hope :) > >> >>>> Maybe we want more VQs. Would help improve parallelism. The question >>>> would then become how to map sockets to VQs. With a simple hash >>>> it's easy to create collisions ... >>> Yes, more VQs can help but the map question is not simple to answer. >>> Maybe we can do an hash on the (cid, port) or do some kind of estimation >>> of queue utilization and try to balance. >>> Should the mapping be unique? >> >> It sounds to me you want some kind of fair queuing? We've already had >> several qdiscs that do this. > Thanks for pointing it out! > >> So if we use the kernel networking xmit path, all those issues could be >> addressed. > One more point to AF_VSOCK + net-stack, but we have to evaluate possible > drawbacks in using the net-stack. (e.g. more latency due to the complexity > of the net-stack?) Yes, we need benchmark the performance. But as we've noticed, current vsock implementation is not efficient, and for stream socket, the overhead should be minimal. The most important thing is to avoid reinventing things that has already existed. Thanks > > Thanks, > Stefano