Received: by 10.192.165.148 with SMTP id m20csp290594imm; Fri, 20 Apr 2018 06:54:32 -0700 (PDT) X-Google-Smtp-Source: AIpwx4/wxnmOtL2uQyF3poLZtgDCpnYfrKpANfsOKpvDIuo8Zgdu80GnMj2Nt5Upy12Q1xc30mMz X-Received: by 2002:a17:902:b081:: with SMTP id p1-v6mr10614779plr.31.1524232472678; Fri, 20 Apr 2018 06:54:32 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1524232472; cv=none; d=google.com; s=arc-20160816; b=w688uR7TFiScco6RZhm1pX9CANHMi4h/4RDTycXowsktTogKyQBDTdd96K3NkplyW/ /WFs2PiHhwIFECTPnJ7DuDciNy6geetYZcnVWHY7vlc4gtMbzZzwW+yZCLnJ63PjnlaM dJOtFFimhvlv5OAtJfzolYNQlWXZZqamh48KXO+YBn9+vNhglu5qywX9KtzTxrusWQSM NKO9WQQF1WO4maqbv0KQb6lU5gQoL0xH2+uIBZL/pmf1xuDOwBF/eC1eCGeghIErdzs3 csLUpMDZxBt+bgtnuVB/g++WEJ7XqGjC30qhJBT1yk7rdtg2pzh0qfsTw+oUp4ir5YnG /v0w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :arc-authentication-results; bh=LHfuUDQkdLp9GncUG3vk5ApY7yYN2UGCYwN6NyljUEw=; b=IPz2RPXpZeyeg9vduNJwoy/0GuIlk3RFM8qultnssapUTs/0ah7F10kHXqYxrVDXhS q0dv3sdooGX31SOSugMDHpi/SNJyakY3Dk09AycQm6M4gFLbHmL26Pf2arYtIbqMC77e uruk6o5Hqvd41xQmruUuwxHjBWIkoLMJNF7tolyVPUVNHUO5RnlQcIZN9kml+Tj8wLVB ZQqG5Xw51Cju2ISw503qUa3hTT7YSHtFaPcWjWcaznAlBLwnJfAkEa0wGolRwrJOj6AX epI/ZxjwCtJeMMn/fPZ+C104pMJVh4qq1UlYMRVnPyMt3Gvc17xHnaKfvf4l5HmnBM9f 2y0g== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id o6si5085549pgs.51.2018.04.20.06.54.18; Fri, 20 Apr 2018 06:54:32 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755248AbeDTNwf (ORCPT + 99 others); Fri, 20 Apr 2018 09:52:35 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:45144 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1754836AbeDTNwc (ORCPT ); Fri, 20 Apr 2018 09:52:32 -0400 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id CC4F6EAEBB; Fri, 20 Apr 2018 13:52:31 +0000 (UTC) Received: from redhat.com (ovpn-126-37.rdu2.redhat.com [10.10.126.37]) by smtp.corp.redhat.com (Postfix) with SMTP id 96EAE112C268; Fri, 20 Apr 2018 13:52:28 +0000 (UTC) Date: Fri, 20 Apr 2018 16:52:28 +0300 From: "Michael S. Tsirkin" To: "Liang, Cunming" Cc: "Bie, Tiwei" , Jason Wang , "alex.williamson@redhat.com" , "ddutile@redhat.com" , "Duyck, Alexander H" , "virtio-dev@lists.oasis-open.org" , "linux-kernel@vger.kernel.org" , "kvm@vger.kernel.org" , "virtualization@lists.linux-foundation.org" , "netdev@vger.kernel.org" , "Daly, Dan" , "Wang, Zhihong" , "Tan, Jianfeng" , "Wang, Xiao W" , "Tian, Kevin" Subject: Re: [RFC] vhost: introduce mdev based hardware vhost backend Message-ID: <20180420165208-mutt-send-email-mst@kernel.org> References: <20180402152330.4158-1-tiwei.bie@intel.com> <622f4bd7-1249-5545-dc5a-5a92b64f5c26@redhat.com> <20180410045723.rftsb7l4l3ip2ioi@debian> <30a63fff-7599-640a-361f-a27e5783012a@redhat.com> <20180419212911-mutt-send-email-mst@kernel.org> <20180420032806.i3jy7xb7emgil6eu@debian> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Scanned-By: MIMEDefang 2.78 on 10.11.54.3 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.1]); Fri, 20 Apr 2018 13:52:31 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.1]); Fri, 20 Apr 2018 13:52:31 +0000 (UTC) for IP:'10.11.54.3' DOMAIN:'int-mx03.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'mst@redhat.com' RCPT:'' Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Apr 20, 2018 at 03:50:41AM +0000, Liang, Cunming wrote: > > > > -----Original Message----- > > From: Bie, Tiwei > > Sent: Friday, April 20, 2018 11:28 AM > > To: Michael S. Tsirkin > > Cc: Jason Wang ; alex.williamson@redhat.com; > > ddutile@redhat.com; Duyck, Alexander H ; > > virtio-dev@lists.oasis-open.org; linux-kernel@vger.kernel.org; > > kvm@vger.kernel.org; virtualization@lists.linux-foundation.org; > > netdev@vger.kernel.org; Daly, Dan ; Liang, Cunming > > ; Wang, Zhihong ; Tan, > > Jianfeng ; Wang, Xiao W ; > > Tian, Kevin > > Subject: Re: [RFC] vhost: introduce mdev based hardware vhost backend > > > > On Thu, Apr 19, 2018 at 09:40:23PM +0300, Michael S. Tsirkin wrote: > > > On Tue, Apr 10, 2018 at 03:25:45PM +0800, Jason Wang wrote: > > > > > > > One problem is that, different virtio ring compatible devices > > > > > > > may have different device interfaces. That is to say, we will > > > > > > > need different drivers in QEMU. It could be troublesome. And > > > > > > > that's what this patch trying to fix. The idea behind this > > > > > > > patch is very simple: mdev is a standard way to emulate device > > > > > > > in kernel. > > > > > > So you just move the abstraction layer from qemu to kernel, and > > > > > > you still need different drivers in kernel for different device > > > > > > interfaces of accelerators. This looks even more complex than > > > > > > leaving it in qemu. As you said, another idea is to implement > > > > > > userspace vhost backend for accelerators which seems easier and > > > > > > could co-work with other parts of qemu without inventing new type of > > messages. > > > > > I'm not quite sure. Do you think it's acceptable to add various > > > > > vendor specific hardware drivers in QEMU? > > > > > > > > > > > > > I don't object but we need to figure out the advantages of doing it > > > > in qemu too. > > > > > > > > Thanks > > > > > > To be frank kernel is exactly where device drivers belong. DPDK did > > > move them to userspace but that's merely a requirement for data path. > > > *If* you can have them in kernel that is best: > > > - update kernel and there's no need to rebuild userspace > > > - apps can be written in any language no need to maintain multiple > > > libraries or add wrappers > > > - security concerns are much smaller (ok people are trying to > > > raise the bar with IOMMUs and such, but it's already pretty > > > good even without) > > > > > > The biggest issue is that you let userspace poke at the device which > > > is also allowed by the IOMMU to poke at kernel memory (needed for > > > kernel driver to work). > > > > I think the device won't and shouldn't be allowed to poke at kernel memory. Its > > kernel driver needs some kernel memory to work. But the device doesn't have > > the access to them. Instead, the device only has the access to: > > > > (1) the entire memory of the VM (if vIOMMU isn't used) or > > (2) the memory belongs to the guest virtio device (if > > vIOMMU is being used). > > > > Below is the reason: > > > > For the first case, we should program the IOMMU for the hardware device based > > on the info in the memory table which is the entire memory of the VM. > > > > For the second case, we should program the IOMMU for the hardware device > > based on the info in the shadow page table of the vIOMMU. > > > > So the memory can be accessed by the device is limited, it should be safe > > especially for the second case. > > > > My concern is that, in this RFC, we don't program the IOMMU for the mdev > > device in the userspace via the VFIO API directly. Instead, we pass the memory > > table to the kernel driver via the mdev device (BAR0) and ask the driver to do the > > IOMMU programming. Someone may don't like it. The main reason why we don't > > program IOMMU via VFIO API in userspace directly is that, currently IOMMU > > drivers don't support mdev bus. > > > > > > > > Yes, maybe if device is not buggy it's all fine, but it's better if we > > > do not have to trust the device otherwise the security picture becomes > > > more murky. > > > > > > I suggested attaching a PASID to (some) queues - see my old post > > > "using PASIDs to enable a safe variant of direct ring access". > > > Ideally we can have a device binding with normal driver in host, meanwhile support to allocate a few queues attaching with PASID on-demand. By vhost mdev transport channel, the data path ability of queues(as a device) can expose to qemu vhost adaptor as a vDPA instance. Then we can avoid VF number limitation, providing vhost data path acceleration in a small granularity. Exactly my point. > > It's pretty cool. We also have some similar ideas. > > Cunming will talk more about this. > > > > Best regards, > > Tiwei Bie > > > > > > > > Then using IOMMU with VFIO to limit access through queue to corrent > > > ranges of memory. > > > > > > > > > -- > > > MST