Received: by 2002:ac0:bc90:0:0:0:0:0 with SMTP id a16csp845180img; Thu, 21 Mar 2019 10:07:21 -0700 (PDT) X-Google-Smtp-Source: APXvYqxqTagVsStXAGO5VYNyLI/yRAgDCQ77zF+qxpqitp4isl3gvDDRYpiDON0nicSPnTdt06F0 X-Received: by 2002:a62:1346:: with SMTP id b67mr4456345pfj.195.1553188041060; Thu, 21 Mar 2019 10:07:21 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1553188041; cv=none; d=google.com; s=arc-20160816; b=FwnIj02nZQBB40QwmggsPoeZjj2jnn94BzKQzYc7hGIwH570mWZg0PbbRSGZgPTo39 TeNVL0/fts5y+GLXaVRiiRRVfzIFO26ZxALWbpRG7mEpL551hHGUPoJu8jyRhVca2VIZ 2R0uquaDd4Ifmci/bVOOruly4Seyg3Jw4X+wE2XQl6fY9X/Aami4Z9xuKQJjtGTRjd/q in2Ir3yR68uHHlzEdMJkcqFt/YO6fPu25DOm3k5L6oADRwc4NFxe98xr+Jy0kRoiKfBa nsX/yjc0QdL+5XEdOb4gxYxAtYwwo9abhYyQyzUQibriN837jt5mLSTzeMOodK44IGIG lj7A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:date:cc:to:from:subject:message-id; bh=Scv993keKbRvi217JKS3llt/XajAXZ5Jy6llXTaI5bc=; b=U00QPq+qTQ00XC71gr2unH6u21DSMvwPmf6aZsQ6ljBVcSXAwGMS12BJCORB7nvMtC SAZBjBYo51rXAKUhAwJEnw+R9O33qvl3Rnp4KjnQXgwotI+xUIXBxLlCMowNWEWhaaSW L7GsUbhLiUZUfQciCaZfiF9h1Pt1kKauEMPYChH0WTLqXa/CsUGPe18FyBwrJr6bWc3Y r4i1OabBqTaZ/IPQdWAD06NNQqDgTKUZVghpt5einVBCWXnPKC7gITIdHNgVBcVNQY7F IHNx9QHi08jccrPJuZo4p1iVjDrWOb5V1bxHLxX8kvsOGRwEgfwRE7r/O+K5y2ZfBavB 4p4A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id b4si4653264pfj.16.2019.03.21.10.07.06; Thu, 21 Mar 2019 10:07:21 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728470AbfCURFF (ORCPT + 99 others); Thu, 21 Mar 2019 13:05:05 -0400 Received: from mx1.redhat.com ([209.132.183.28]:46196 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726787AbfCURFE (ORCPT ); Thu, 21 Mar 2019 13:05:04 -0400 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id D827C307D840; Thu, 21 Mar 2019 17:05:03 +0000 (UTC) Received: from maximlenovopc.usersys.redhat.com (unknown [10.35.206.30]) by smtp.corp.redhat.com (Postfix) with ESMTP id 22A521018A30; Thu, 21 Mar 2019 17:04:50 +0000 (UTC) Message-ID: <8698ad583b1cfe86afc3d5440be630fc3e8e0680.camel@redhat.com> Subject: Re: From: Maxim Levitsky To: Felipe Franciosi , Keith Busch Cc: Stefan Hajnoczi , Fam Zheng , "kvm@vger.kernel.org" , Wolfram Sang , "linux-nvme@lists.infradead.org" , "linux-kernel@vger.kernel.org" , Keith Busch , Kirti Wankhede , Mauro Carvalho Chehab , "Paul E . McKenney" , Christoph Hellwig , Sagi Grimberg , "Harris, James R" , Liang Cunming , Jens Axboe , Alex Williamson , Thanos Makatos , John Ferlan , Liu Changpeng , Greg Kroah-Hartman , Nicolas Ferre , Paolo Bonzini , Amnon Ilan , "David S . Miller" Date: Thu, 21 Mar 2019 19:04:50 +0200 In-Reply-To: References: <20190319144116.400-1-mlevitsk@redhat.com> <488768D7-1396-4DD1-A648-C86E5CF7DB2F@nutanix.com> <42f444d22363bc747f4ad75e9f0c27b40a810631.camel@redhat.com> <20190321161239.GH31434@stefanha-x1.localdomain> <20190321162140.GA29342@localhost.localdomain> Content-Type: text/plain; charset="UTF-8" Mime-Version: 1.0 Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.48]); Thu, 21 Mar 2019 17:05:04 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 2019-03-21 at 16:41 +0000, Felipe Franciosi wrote: > > On Mar 21, 2019, at 4:21 PM, Keith Busch wrote: > > > > On Thu, Mar 21, 2019 at 04:12:39PM +0000, Stefan Hajnoczi wrote: > > > mdev-nvme seems like a duplication of SPDK. The performance is not > > > better and the features are more limited, so why focus on this approach? > > > > > > One argument might be that the kernel NVMe subsystem wants to offer this > > > functionality and loading the kernel module is more convenient than > > > managing SPDK to some users. > > > > > > Thoughts? > > > > Doesn't SPDK bind a controller to a single process? mdev binds to > > namespaces (or their partitions), so you could have many mdev's assigned > > to many VMs accessing a single controller. > > Yes, it binds to a single process which can drive the datapath of multiple > virtual controllers for multiple VMs (similar to what you described for mdev). > You can therefore efficiently poll multiple VM submission queues (and multiple > device completion queues) from a single physical CPU. > > The same could be done in the kernel, but the code gets complicated as you add > more functionality to it. As this is a direct interface with an untrusted > front-end (the guest), it's also arguably safer to do in userspace. > > Worth noting: you can eventually have a single physical core polling all sorts > of virtual devices (eg. virtual storage or network controllers) very > efficiently. And this is quite configurable, too. In the interest of fairness, > performance or efficiency, you can choose to dynamically add or remove queues > to the poll thread or spawn more threads and redistribute the work. > > F. Note though that SPDK doesn't support sharing the device between host and the guests, it takes over the nvme device, thus it makes the kernel nvme driver unbind from it. My driver creates a polling thread per guest, but its trivial to add option to use the same polling thread for many guests if there need for that. Best regards, Maxim Levitsky