Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 80951C433FE for ; Tue, 11 Jan 2022 12:58:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240140AbiAKM6F (ORCPT ); Tue, 11 Jan 2022 07:58:05 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35370 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235544AbiAKM6C (ORCPT ); Tue, 11 Jan 2022 07:58:02 -0500 Received: from mail-ed1-x52d.google.com (mail-ed1-x52d.google.com [IPv6:2a00:1450:4864:20::52d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 70886C061748 for ; Tue, 11 Jan 2022 04:58:02 -0800 (PST) Received: by mail-ed1-x52d.google.com with SMTP id k15so66666641edk.13 for ; Tue, 11 Jan 2022 04:58:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=uz1h5CEIPed2HfgHHUpg29tv2CH2az1LB5lhdw8+vao=; b=VCCztgsxzf6eUODoTzV63GaI2hEf+L8SknRatUqlNg/xhQIo6oZ1oSW0kv01emxrk0 ODYefAJSnBYp/f027rnSV6nDAjopS+C5OD3ll9T0hILqtnTrbeaDVes179ML0/YpZfD7 xIEhi5WITnIgvFXct/drtY7nX42/s3+mqYX4n0o5BQDrWp4dTgzLIy5hc7IrgkA8+y+N fuP76hHyTu/+fhkuABRQnIzannBo1MRv0ZSkVmjbltWKFiZklz1rMrgAtzjPUSXSPzYM 5TzouUnW0IrsFgBG9btbCRMMdIMndSsS7BBrAE/k3Gde3gacDwaSV2Oi1tXF+v6wTLbw a+oQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=uz1h5CEIPed2HfgHHUpg29tv2CH2az1LB5lhdw8+vao=; b=mcBa3K2rN2o5GdcXz3rcKNchXP4LqzrcmT18P9H/9p6PvA93cJ25dGz+ThF5Ry0iwF SRPAGpe8lM5YrioFW3SyUnBG1uDM25rq7WDg5sx+nj6BEe7g6o9o60RS9XCR9w7rX7u+ eJqsCklXEOCKE7hAcUBvGwRU74kJcjqaCHKcw2O9XKrnFIAtbjx2yzDYCsPTVKqyFPkC 6YLCYZM/9epFgP0mARaRCQ7nobfei1HIc8Kijkl+w4x1Y8GOPv3zdEEK6EGPjOCfjYmA wYo/u5FMxOgjmcZ3lH8Y9c2EunTo43Ez8P9biYTMAes5Hs4GZLoliQScKQaGrCWgDhH0 66ug== X-Gm-Message-State: AOAM532DzK29T6NyBxYEQ3EubuNYJD/lImHOI+vb3ETG7W0DvTJDP2CF Jl8NH/35s8iQA9bldgjDlK+13U+nrf92+LfP/M8x X-Google-Smtp-Source: ABdhPJxR3il7T/JRiMnDBqI+LmQ3D6uzJvszq1H9EZPMgpPq8q+tiwynJLLSJLb+eKsmKti56DNWpzGpdfZLIeI85DE= X-Received: by 2002:a05:6402:124b:: with SMTP id l11mr4141116edw.9.1641905880942; Tue, 11 Jan 2022 04:58:00 -0800 (PST) MIME-Version: 1.0 References: <20210830141737.181-1-xieyongji@bytedance.com> <20220110075546-mutt-send-email-mst@kernel.org> <20220110100911-mutt-send-email-mst@kernel.org> <20220110103938-mutt-send-email-mst@kernel.org> <20220111065301-mutt-send-email-mst@kernel.org> In-Reply-To: <20220111065301-mutt-send-email-mst@kernel.org> From: Yongji Xie Date: Tue, 11 Jan 2022 20:57:49 +0800 Message-ID: Subject: Re: [PATCH v12 00/13] Introduce VDUSE - vDPA Device in Userspace To: "Michael S. Tsirkin" Cc: Jason Wang , Stefan Hajnoczi , Stefano Garzarella , Parav Pandit , Christoph Hellwig , Christian Brauner , Randy Dunlap , Matthew Wilcox , Al Viro , Jens Axboe , bcrl@kvack.org, Jonathan Corbet , =?UTF-8?Q?Mika_Penttil=C3=A4?= , Dan Carpenter , joro@8bytes.org, Greg KH , He Zhe , Liu Xiaodong , Joe Perches , Robin Murphy , Will Deacon , John Garry , songmuchun@bytedance.com, virtualization , Netdev , kvm , linux-fsdevel@vger.kernel.org, iommu@lists.linux-foundation.org, linux-kernel Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jan 11, 2022 at 7:54 PM Michael S. Tsirkin wrote: > > On Tue, Jan 11, 2022 at 11:31:37AM +0800, Yongji Xie wrote: > > On Mon, Jan 10, 2022 at 11:44 PM Michael S. Tsirkin wr= ote: > > > > > > On Mon, Jan 10, 2022 at 11:24:40PM +0800, Yongji Xie wrote: > > > > On Mon, Jan 10, 2022 at 11:10 PM Michael S. Tsirkin wrote: > > > > > > > > > > On Mon, Jan 10, 2022 at 09:54:08PM +0800, Yongji Xie wrote: > > > > > > On Mon, Jan 10, 2022 at 8:57 PM Michael S. Tsirkin wrote: > > > > > > > > > > > > > > On Mon, Aug 30, 2021 at 10:17:24PM +0800, Xie Yongji wrote: > > > > > > > > This series introduces a framework that makes it possible t= o implement > > > > > > > > software-emulated vDPA devices in userspace. And to make th= e device > > > > > > > > emulation more secure, the emulated vDPA device's control p= ath is handled > > > > > > > > in the kernel and only the data path is implemented in the = userspace. > > > > > > > > > > > > > > > > Since the emuldated vDPA device's control path is handled i= n the kernel, > > > > > > > > a message mechnism is introduced to make userspace be aware= of the data > > > > > > > > path related changes. Userspace can use read()/write() to r= eceive/reply > > > > > > > > the control messages. > > > > > > > > > > > > > > > > In the data path, the core is mapping dma buffer into VDUSE= daemon's > > > > > > > > address space, which can be implemented in different ways d= epending on > > > > > > > > the vdpa bus to which the vDPA device is attached. > > > > > > > > > > > > > > > > In virtio-vdpa case, we implements a MMU-based software IOT= LB with > > > > > > > > bounce-buffering mechanism to achieve that. And in vhost-vd= pa case, the dma > > > > > > > > buffer is reside in a userspace memory region which can be = shared to the > > > > > > > > VDUSE userspace processs via transferring the shmfd. > > > > > > > > > > > > > > > > The details and our user case is shown below: > > > > > > > > > > > > > > > > ------------------------ ------------------------- ---= ------------------------------------------- > > > > > > > > | Container | | QEMU(VM) | | = VDUSE daemon | > > > > > > > > | --------- | | ------------------- | | -= ------------------------ ---------------- | > > > > > > > > | |dev/vdx| | | |/dev/vhost-vdpa-x| | | |= vDPA device emulation | | block driver | | > > > > > > > > ------------+----------- -----------+------------ ---= ----------+----------------------+--------- > > > > > > > > | | = | | > > > > > > > > | | = | | > > > > > > > > ------------+---------------------------+------------------= ----------+----------------------+--------- > > > > > > > > | | block device | | vhost device | = | vduse driver | | TCP/IP | | > > > > > > > > | -------+-------- --------+-------- = -------+-------- -----+---- | > > > > > > > > | | | = | | | > > > > > > > > | ----------+---------- ----------+----------- = -------+------- | | > > > > > > > > | | virtio-blk driver | | vhost-vdpa driver | = | vdpa device | | | > > > > > > > > | ----------+---------- ----------+----------- = -------+------- | | > > > > > > > > | | virtio bus | = | | | > > > > > > > > | --------+----+----------- | = | | | > > > > > > > > | | | = | | | > > > > > > > > | ----------+---------- | = | | | > > > > > > > > | | virtio-blk device | | = | | | > > > > > > > > | ----------+---------- | = | | | > > > > > > > > | | | = | | | > > > > > > > > | -----------+----------- | = | | | > > > > > > > > | | virtio-vdpa driver | | = | | | > > > > > > > > | -----------+----------- | = | | | > > > > > > > > | | | = | vdpa bus | | > > > > > > > > | -----------+----------------------+------------------= ---------+------------ | | > > > > > > > > | = ---+--- | > > > > > > > > -----------------------------------------------------------= ------------------------------| NIC |------ > > > > > > > > = ---+--- > > > > > > > > = | > > > > > > > > = ---------+--------- > > > > > > > > = | Remote Storages | > > > > > > > > = ------------------- > > > > > > > > > > > > > > > > We make use of it to implement a block device connecting to > > > > > > > > our distributed storage, which can be used both in containe= rs and > > > > > > > > VMs. Thus, we can have an unified technology stack in this = two cases. > > > > > > > > > > > > > > > > To test it with null-blk: > > > > > > > > > > > > > > > > $ qemu-storage-daemon \ > > > > > > > > --chardev socket,id=3Dcharmonitor,path=3D/tmp/qmp.soc= k,server,nowait \ > > > > > > > > --monitor chardev=3Dcharmonitor \ > > > > > > > > --blockdev driver=3Dhost_device,cache.direct=3Don,aio= =3Dnative,filename=3D/dev/nullb0,node-name=3Ddisk0 \ > > > > > > > > --export type=3Dvduse-blk,id=3Dtest,node-name=3Ddisk0= ,writable=3Don,name=3Dvduse-null,num-queues=3D16,queue-size=3D128 > > > > > > > > > > > > > > > > The qemu-storage-daemon can be found at https://github.com/= bytedance/qemu/tree/vduse > > > > > > > > > > > > > > It's been half a year - any plans to upstream this? > > > > > > > > > > > > Yeah, this is on my to-do list this month. > > > > > > > > > > > > Sorry for taking so long... I've been working on another projec= t > > > > > > enabling userspace RDMA with VDUSE for the past few months. So = I > > > > > > didn't have much time for this. Anyway, I will submit the first > > > > > > version as soon as possible. > > > > > > > > > > > > Thanks, > > > > > > Yongji > > > > > > > > > > Oh fun. You mean like virtio-rdma? Or RDMA as a backend for regul= ar > > > > > virtio? > > > > > > > > > > > > > Yes, like virtio-rdma. Then we can develop something like userspace > > > > rxe=E3=80=81siw or custom protocol with VDUSE. > > > > > > > > Thanks, > > > > Yongji > > > > > > Would be interesting to see the spec for that. > > > > Will send it ASAP. > > > > > The issues with RDMA revolved around the fact that current > > > apps tend to either use non-standard propocols for connection > > > establishment or use UD where there's IIRC no standard > > > at all. So QP numbers are hard to virtualize. > > > Similarly many use LIDs directly with the same effect. > > > GUIDs might be virtualizeable but no one went to the effort. > > > > > > > Actually we aimed at emulating a soft RDMA with normal NIC (not use > > RDMA capability) rather than virtualizing a physical RDMA NIC into > > several vRDMA devices. If so, I think we won't have those issues, > > right? > > Right, maybe you won't. > > > > To say nothing about the interaction with memory overcommit. > > > > > > > I don't get you here. Could you give me more details? > > > > Thanks, > > Yongji > > RDMA devices tend to want to pin the memory under DMA. > I see. Maybe something like dm or odp could be helpful. Thanks, Yongji