Received: by 2002:a05:6a10:f3d0:0:0:0:0 with SMTP id a16csp4205917pxv; Tue, 29 Jun 2021 01:16:58 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyNFfIs2TeS41aIQsZxTLpLzOOFnq8sOfMbjexQDlg17xBwcD+TYZBW0rHINLxYSoCY/At5 X-Received: by 2002:a02:5e8a:: with SMTP id h132mr3169186jab.60.1624954618317; Tue, 29 Jun 2021 01:16:58 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1624954618; cv=none; d=google.com; s=arc-20160816; b=o8LfSOEV4LmmJyR9dZ4a01nT5ltRMcXjg43cz6vMZMstL2TZAyGw7GMsdzRQTGXaOz hhq58F96vkG6S/sj99FaDuE8mLXqV7DHi0Raz+RMjFX9wvWRXUdKRSjlBZNFlk+NOg4u A/TXmt7nHoAsdF0nEkwSGCKCGh35HIfC/26yKC0sYePG65jIEOS2KSxuSKZQIgiDmm4q C5QEE+qliheUW/etIp2cKS/DLpZIOaNvLPKagO8P8sXbjrhESx8X8vVLOvM0FfMPtsxt HlYuStmkuQbqigSLd2F4DnlZoZzZ07JsPy0QJk08MRzFbf/KgGVUur+w+NOp7sMhOubU DeBA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=uptPqcgshGnCftIWPnxGiyLOnUUUIP0/+ojiVFO1k00=; b=EYNj4CdEPVVAmmB2VDsMAJH+Jp4XOp01iNci7jMlvpUEE7cJFBSeuELOc84FdvvAW4 pOoC73gJkVch5MkOOHnKxQonvR/lK60gR/7BjZ1Xy4BglJ3AYQHuaEfS646KK4yezw+k oWOFaWwMTJACBED81Tm7KG0odZzP+O/sAz2847WJ09h/FZHMus7gQKcoPxH90e8KiniP VE5v1i0VxtCPFSDH/xhVGJ1CxosJz2Gf2A2V5vxzNV3HAWCqFID6dO9WyjH1PSkXkeCc nvEvCAud9Heoc3HO9l5B8VZ3FJOjuO30rZ7H0hLptau3W5FUhhPh44nHRHlPfl0Y0QhM HCKA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@bytedance-com.20150623.gappssmtp.com header.s=20150623 header.b=DRDFZWTp; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=bytedance.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id b5si4301136ilo.101.2021.06.29.01.16.46; Tue, 29 Jun 2021 01:16:58 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@bytedance-com.20150623.gappssmtp.com header.s=20150623 header.b=DRDFZWTp; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=bytedance.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232420AbhF2IRR (ORCPT + 99 others); Tue, 29 Jun 2021 04:17:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46480 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232335AbhF2IRP (ORCPT ); Tue, 29 Jun 2021 04:17:15 -0400 Received: from mail-ej1-x636.google.com (mail-ej1-x636.google.com [IPv6:2a00:1450:4864:20::636]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6EAA3C061574 for ; Tue, 29 Jun 2021 01:14:47 -0700 (PDT) Received: by mail-ej1-x636.google.com with SMTP id o5so9821635ejy.7 for ; Tue, 29 Jun 2021 01:14:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=uptPqcgshGnCftIWPnxGiyLOnUUUIP0/+ojiVFO1k00=; b=DRDFZWTpGi9GCyTDEZBHK+CH/M843cbtTy9vUm8a5IsHg/jkfiUZbd4xYJX6iZu6cW 9GFiPjoCONUMKxg5SnLXbhDBJ/pjzDoy1JqsZXHVy8Kr2mESTA8qWpG1fam0YYiIMbhU 9GQywd2enJcCN300YNcOenjqf+z61wovjpxOBODz/aIV6xIqUGQr1Ssg3Hyiv4mAs1Cc kfrvmN8IkAPcaOiP2ZLt/twpqM4r/ihXgA9G7NK7+twdlzhXEEjkO9OxziYqElX+Wri1 /EcksaYxLEvAl4P9igdr5L643F8OFvKS3ZRNWd9sxRl+uBkLH/YyNWfUL3Qa4HWi9OmH NBOg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=uptPqcgshGnCftIWPnxGiyLOnUUUIP0/+ojiVFO1k00=; b=NqEiwFm6oeCt/Gy0jabrqhRurvR7he0tv792kgWSxBEr/UWWIY6e7iRGyxJK0nMpWe W4QSamx+TrW0blUdoO2PiB41ddf/ki8ijx+fRX/Fx3/YfVcD8R4FuqHfWOM6aEsENNji MEePMdxivtxB0MdDJgEOXNf85U/HwADvIXNpdrgP74fLQ+RWKu/adS9a+WUitOtXhLOm kJcULj2xlL1AwWRoD3W4f6SpgBkeEUBvWD1lqABHNeCguKgI91YIkwgNjlznBMgsYYyl TmntQTiIaZRL6DpyPJq8EnCnx5kOWUdwou0eYvz0VWkgS9t2dEOdQyqz3Myxn624X2gf 3SUg== X-Gm-Message-State: AOAM533mVZMA98Lwx8zZNC9He0Mk6WKrOcbOLjyRVfmM5Hum0or3U6li KSfnUBM0ZnAM5wMCYzSNptVx42rLTS4Hy+RYoJsY X-Received: by 2002:a17:906:7142:: with SMTP id z2mr28262180ejj.427.1624954486045; Tue, 29 Jun 2021 01:14:46 -0700 (PDT) MIME-Version: 1.0 References: <20210615141331.407-1-xieyongji@bytedance.com> <20210628103309.GA205554@storage2.sh.intel.com> <41cc419e-48b5-6755-0cb0-9033bd1310e4@redhat.com> In-Reply-To: From: Yongji Xie Date: Tue, 29 Jun 2021 16:14:35 +0800 Message-ID: Subject: Re: RE: [PATCH v8 00/10] Introduce VDUSE - vDPA Device in Userspace To: "Liu, Xiaodong" Cc: Jason Wang , "mst@redhat.com" , "stefanha@redhat.com" , "sgarzare@redhat.com" , "parav@nvidia.com" , "hch@infradead.org" , "christian.brauner@canonical.com" , "rdunlap@infradead.org" , "willy@infradead.org" , "viro@zeniv.linux.org.uk" , "axboe@kernel.dk" , "bcrl@kvack.org" , "corbet@lwn.net" , "mika.penttila@nextfour.com" , "dan.carpenter@oracle.com" , "joro@8bytes.org" , "gregkh@linuxfoundation.org" , "songmuchun@bytedance.com" , "virtualization@lists.linux-foundation.org" , "netdev@vger.kernel.org" , "kvm@vger.kernel.org" , "linux-fsdevel@vger.kernel.org" , "iommu@lists.linux-foundation.org" , "linux-kernel@vger.kernel.org" Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jun 29, 2021 at 3:56 PM Liu, Xiaodong wrot= e: > > > > >-----Original Message----- > >From: Jason Wang > >Sent: Tuesday, June 29, 2021 12:11 PM > >To: Liu, Xiaodong ; Xie Yongji > >; mst@redhat.com; stefanha@redhat.com; > >sgarzare@redhat.com; parav@nvidia.com; hch@infradead.org; > >christian.brauner@canonical.com; rdunlap@infradead.org; willy@infradead.= org; > >viro@zeniv.linux.org.uk; axboe@kernel.dk; bcrl@kvack.org; corbet@lwn.net= ; > >mika.penttila@nextfour.com; dan.carpenter@oracle.com; joro@8bytes.org; > >gregkh@linuxfoundation.org > >Cc: songmuchun@bytedance.com; virtualization@lists.linux-foundation.org; > >netdev@vger.kernel.org; kvm@vger.kernel.org; linux-fsdevel@vger.kernel.o= rg; > >iommu@lists.linux-foundation.org; linux-kernel@vger.kernel.org > >Subject: Re: [PATCH v8 00/10] Introduce VDUSE - vDPA Device in Userspace > > > > > >=E5=9C=A8 2021/6/28 =E4=B8=8B=E5=8D=881:54, Liu, Xiaodong =E5=86=99=E9= =81=93: > >>> Several issues: > >>> > >>> - VDUSE needs to limit the total size of the bounce buffers (64M if I= was not > >>> wrong). Does it work for SPDK? > >> Yes, Jason. It is enough and works for SPDK. > >> Since it's a kind of bounce buffer mainly for in-flight IO, so limited= size like > >> 64MB is enough. > > > > > >Ok. > > > > > >> > >>> - VDUSE can use hugepages but I'm not sure we can mandate hugepages (= or > >we > >>> need introduce new flags for supporting this) > >> Same with your worry, I'm afraid too that it is a hard for a kernel mo= dule > >> to directly preallocate hugepage internal. > >> What I tried is that: > >> 1. A simple agent daemon (represents for one device) `preallocates` a= nd maps > >> dozens of 2MB hugepages (like 64MB) for one device. > >> 2. The daemon passes its mapping addr&len and hugepage fd to kernel > >> module through created IOCTL. > >> 3. Kernel module remaps the hugepages inside kernel. > > > > > >Such model should work, but the main "issue" is that it introduce > >overheads in the case of vhost-vDPA. > > > >Note that in the case of vhost-vDPA, we don't use bounce buffer, the > >userspace pages were shared directly. > > > >And since DMA is not done per page, it prevents us from using tricks > >like vm_insert_page() in those cases. > > > > Yes, really, it's a problem to handle vhost-vDPA case. > But there are already several solutions to get VM served, like vhost-user= , > vfio-user, so at least for SPDK, it won't serve VM through VDUSE. If a us= er > still want to do that, then the user should tolerate Introduced overhead. > > In other words, software backend like SPDK, will appreciate the virtio > datapath of VDUSE to serve local host instead of VM. That's why I also dr= afted > a "virtio-local" to bridge vhost-user target and local host kernel virtio= -blk. > > > > >> 4. Vhost user target gets and maps hugepage fd from kernel module > >> in vhost-user msg through Unix Domain Socket cmsg. > >> Then kernel module and target map on the same hugepage based > >> bounce buffer for in-flight IO. > >> > >> If there is one option in VDUSE to map userspace preallocated memory, = then > >> VDUSE should be able to mandate it even it is hugepage based. > >> > > > >As above, this requires some kind of re-design since VDUSE depends on > >the model of mmap(MAP_SHARED) instead of umem registering. > > Got it, Jason, this may be hard for current version of VDUSE. > Maybe we can consider these options after VDUSE merged later. > > Since if VDUSE datapath could be directly leveraged by vhost-user target, > its value will be propagated immediately. > Agreed=EF=BC=81 Thanks, Yongji