Received: by 2002:ac0:bc90:0:0:0:0:0 with SMTP id a16csp440519img; Fri, 22 Mar 2019 01:05:43 -0700 (PDT) X-Google-Smtp-Source: APXvYqyAfNDYZIq5HOCJd3rf4biVQmCbUCgks/pb3JMWNiHpJ5v0wrX0+JQLr35vHZO6CWHTseaW X-Received: by 2002:aa7:8012:: with SMTP id j18mr8141355pfi.42.1553241943196; Fri, 22 Mar 2019 01:05:43 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1553241943; cv=none; d=google.com; s=arc-20160816; b=B1kqeBCprjcaq3fXwWm8N1wSWwvFFg5+x8PW7KygCsXf44g6QTvOU4REFMziRxHJJ2 OWSCFLEcP2Ie+87n+pDiqRKLSdMsgUhsvGKuPIKYfgicC4kPE2xMKNw39va9503cDfSG usnf+iwMqBuGRxlnFGJ7umxAAl0Pa0nYKhho/nhV/VKgCI4NTdLpHMpkDbeY0dMAtOZN zadSphq6s1Za/QIfC9Ts2Vpt+bobEG499ACTI5v5lPnCzK6PLRehueTHn42cThxKLOeR xSf0z719j2i9MWCwBjAlIcOTmTNztl1i88DIsKPXd1TKDKWZoMugG7BQo9eRXhp3qRRU Lo/Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:content-transfer-encoding :content-id:content-language:accept-language:in-reply-to:references :message-id:date:thread-index:subject:cc:to:from:dkim-signature; bh=EZ8R9BC3frBOI3ZSptX3fGBZLYT12HH3RyVo/XixI5k=; b=vbrqBOJ65QJm7/pzZDAHlxA3J30vCSgD9M/LSXhYWYtkX4GZq/1QtrIUIW51S0xxCM 9CEwcxs2jJzwyS1Ll569UWaYz+r5iBrpT/LeO/awyBlDluDYod1Pdl6I6nwElLKK9RuS S/AzE9tmAu6/SQz6J8Of1p467RQ7RytqCg6W9RHeDuIzqfqgRrDchZWT9URyffOxouS5 PcmQtzXOxikC4rEprFYe2eNfd5o/XmkoLbjn8xpuW7jLHwK34xhuE0o8nWSvv8hlwiHO DVEChv/3ZI9jtwCRJzWw58QppK7s7SCf9Ctzsu0ncmuX24ZVavlhomKwhoq5hQ+r2Dh7 /kYA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@nutanix.com header.s=proofpoint20171006 header.b=bK1iFDPN; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=nutanix.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id a15si5954548pgw.110.2019.03.22.01.05.26; Fri, 22 Mar 2019 01:05:43 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@nutanix.com header.s=proofpoint20171006 header.b=bK1iFDPN; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=nutanix.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727453AbfCVIEt (ORCPT + 99 others); Fri, 22 Mar 2019 04:04:49 -0400 Received: from mx0a-002c1b01.pphosted.com ([148.163.151.68]:34074 "EHLO mx0a-002c1b01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725981AbfCVIEs (ORCPT ); Fri, 22 Mar 2019 04:04:48 -0400 X-Greylist: delayed 565 seconds by postgrey-1.27 at vger.kernel.org; Fri, 22 Mar 2019 04:04:47 EDT Received: from pps.filterd (m0127840.ppops.net [127.0.0.1]) by mx0a-002c1b01.pphosted.com (8.16.0.27/8.16.0.27) with SMTP id x2M7o2Da026410; Fri, 22 Mar 2019 00:54:52 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nutanix.com; h=from : to : cc : subject : date : message-id : references : in-reply-to : content-type : content-id : content-transfer-encoding : mime-version; s=proofpoint20171006; bh=EZ8R9BC3frBOI3ZSptX3fGBZLYT12HH3RyVo/XixI5k=; b=bK1iFDPNpQzGPKYe6nApHOlSZMnxg8W3of+ONZ47DdkINSMheUwhZsO1GNoPFqWncImc WiPbkRLPC53h1+qSdS/JkUcqsv8UJez94OZzAUYWh43GyA9U87Vmm4sfiFn8vylvMAS6 XsGgaXoLrr86JMBjkMIkkd3v/D1/nr1czig0JI25XXorMmn5gRTaZAy0IKU37BnWvo7e KMoj0JU/K+NWO8rUPXYAup7HavIEREI87WOvpmw7aTTNwS5a8vDJbHs0imTKxVSenIQo 7kSPCfemwTrJwavbjdM5Ckugd8pJcAGfKvuv3snRGh8blNKrWs7aJHSwslHK+gfzGoPD 7Q== Received: from nam04-co1-obe.outbound.protection.outlook.com (mail-co1nam04lp2051.outbound.protection.outlook.com [104.47.45.51]) by mx0a-002c1b01.pphosted.com with ESMTP id 2rceta16t1-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-SHA384 bits=256 verify=NOT); Fri, 22 Mar 2019 00:54:52 -0700 Received: from MWHPR02MB2656.namprd02.prod.outlook.com (10.168.206.18) by MWHPR02MB2480.namprd02.prod.outlook.com (10.168.204.150) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.1730.15; Fri, 22 Mar 2019 07:54:50 +0000 Received: from MWHPR02MB2656.namprd02.prod.outlook.com ([fe80::21ac:e2b1:ea12:2266]) by MWHPR02MB2656.namprd02.prod.outlook.com ([fe80::21ac:e2b1:ea12:2266%10]) with mapi id 15.20.1730.017; Fri, 22 Mar 2019 07:54:50 +0000 From: Felipe Franciosi To: Maxim Levitsky CC: Keith Busch , Stefan Hajnoczi , Fam Zheng , "kvm@vger.kernel.org" , Wolfram Sang , "linux-nvme@lists.infradead.org" , "linux-kernel@vger.kernel.org" , Keith Busch , Kirti Wankhede , Mauro Carvalho Chehab , "Paul E . McKenney" , Christoph Hellwig , Sagi Grimberg , "Harris, James R" , Liang Cunming , Jens Axboe , Alex Williamson , Thanos Makatos , John Ferlan , Liu Changpeng , Greg Kroah-Hartman , Nicolas Ferre , Paolo Bonzini , Amnon Ilan , "David S . Miller" Subject: Re: Thread-Index: AQHU4ISGDFUGO+4rpUu49Zh8vFOt4Q== Date: Fri, 22 Mar 2019 07:54:50 +0000 Message-ID: <0E8918CB-F679-4A5C-92AD-239E9CEC260C@nutanix.com> References: <20190319144116.400-1-mlevitsk@redhat.com> <488768D7-1396-4DD1-A648-C86E5CF7DB2F@nutanix.com> <42f444d22363bc747f4ad75e9f0c27b40a810631.camel@redhat.com> <20190321161239.GH31434@stefanha-x1.localdomain> <20190321162140.GA29342@localhost.localdomain> <8698ad583b1cfe86afc3d5440be630fc3e8e0680.camel@redhat.com> In-Reply-To: <8698ad583b1cfe86afc3d5440be630fc3e8e0680.camel@redhat.com> Accept-Language: en-GB, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [62.254.189.133] x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: 5c6106c6-8b74-44ee-205c-08d6ae9ba94c x-microsoft-antispam: BCL:0;PCL:0;RULEID:(2390118)(7020095)(4652040)(8989299)(4534185)(4627221)(201703031133081)(201702281549075)(8990200)(5600127)(711020)(4605104)(2017052603328)(7153060)(7193020);SRVR:MWHPR02MB2480; x-ms-traffictypediagnostic: MWHPR02MB2480: x-proofpoint-crosstenant: true x-microsoft-antispam-prvs: x-forefront-prvs: 09840A4839 x-forefront-antispam-report: SFV:NSPM;SFS:(10019020)(376002)(136003)(396003)(366004)(39860400002)(346002)(189003)(199004)(6916009)(229853002)(99286004)(6116002)(3846002)(76176011)(7736002)(305945005)(7116003)(7416002)(106356001)(105586002)(3480700005)(33656002)(2906002)(14454004)(5660300002)(478600001)(446003)(81166006)(86362001)(8676002)(6436002)(81156014)(6486002)(4326008)(486006)(2616005)(476003)(11346002)(25786009)(66066001)(221173001)(6512007)(6246003)(53936002)(8936002)(97736004)(102836004)(316002)(68736007)(82746002)(93886005)(256004)(14444005)(186003)(36756003)(26005)(54906003)(83716004)(71190400001)(71200400001)(53546011)(6506007)(4743002)(64030200001);DIR:OUT;SFP:1102;SCL:1;SRVR:MWHPR02MB2480;H:MWHPR02MB2656.namprd02.prod.outlook.com;FPR:;SPF:None;LANG:en;PTR:InfoNoRecords;MX:1;A:1; received-spf: None (protection.outlook.com: nutanix.com does not designate permitted sender hosts) x-ms-exchange-senderadcheck: 1 x-microsoft-antispam-message-info: gxi3CCu2iEHt7ELJfJmKevLBj7N325WqhQ0o1Giwzbkv7GstO6WxgdN685mL9B9YzlRqwE4S1xz3KsbfkhwrBsxgXYYyOMav2J8Tl8ycmNpdu5k/oFSar4Vihkkok+2Z/GSaSWw3S8yZ/cZOXEJFfKtqa5Aq0+6ET9RL4rLCEysHRUEfa2AzCO2UQ6ivwivqGFcqSaoVAUMDW6sr2wR+yFqAchkYSzJBhZ6RGd4tO7PDuSwWMXCUmhcG91BeHt4zGtbZkn9wyxRKcC/liubN/MyKxZdTpTfaAu4Yj4rj/qflFka0AAjlsuNAIfxgK7NgifYZPPL+jBi0x5x9m5Z+0fqKSJm9kxaXwx/LPDNTrl9GPlFDxl4HpGsZB4lId9Y2uNzz9JSJ4ObpeQiCikCHSL+4WrMGEZAPrAwf/n2D86M= Content-Type: text/plain; charset="us-ascii" Content-ID: <06727B797CBF14419894608A12BEBB3F@namprd02.prod.outlook.com> Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: nutanix.com X-MS-Exchange-CrossTenant-Network-Message-Id: 5c6106c6-8b74-44ee-205c-08d6ae9ba94c X-MS-Exchange-CrossTenant-originalarrivaltime: 22 Mar 2019 07:54:50.1424 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: bb047546-786f-4de1-bd75-24e5b6f79043 X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-Transport-CrossTenantHeadersStamped: MWHPR02MB2480 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:,, definitions=2019-03-22_05:,, signatures=0 X-Proofpoint-Spam-Reason: safe Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > On Mar 21, 2019, at 5:04 PM, Maxim Levitsky wrote: >=20 > On Thu, 2019-03-21 at 16:41 +0000, Felipe Franciosi wrote: >>> On Mar 21, 2019, at 4:21 PM, Keith Busch wrote: >>>=20 >>> On Thu, Mar 21, 2019 at 04:12:39PM +0000, Stefan Hajnoczi wrote: >>>> mdev-nvme seems like a duplication of SPDK. The performance is not >>>> better and the features are more limited, so why focus on this approac= h? >>>>=20 >>>> One argument might be that the kernel NVMe subsystem wants to offer th= is >>>> functionality and loading the kernel module is more convenient than >>>> managing SPDK to some users. >>>>=20 >>>> Thoughts? >>>=20 >>> Doesn't SPDK bind a controller to a single process? mdev binds to >>> namespaces (or their partitions), so you could have many mdev's assigne= d >>> to many VMs accessing a single controller. >>=20 >> Yes, it binds to a single process which can drive the datapath of multip= le >> virtual controllers for multiple VMs (similar to what you described for = mdev). >> You can therefore efficiently poll multiple VM submission queues (and mu= ltiple >> device completion queues) from a single physical CPU. >>=20 >> The same could be done in the kernel, but the code gets complicated as y= ou add >> more functionality to it. As this is a direct interface with an untruste= d >> front-end (the guest), it's also arguably safer to do in userspace. >>=20 >> Worth noting: you can eventually have a single physical core polling all= sorts >> of virtual devices (eg. virtual storage or network controllers) very >> efficiently. And this is quite configurable, too. In the interest of fai= rness, >> performance or efficiency, you can choose to dynamically add or remove q= ueues >> to the poll thread or spawn more threads and redistribute the work. >>=20 >> F. >=20 > Note though that SPDK doesn't support sharing the device between host and= the > guests, it takes over the nvme device, thus it makes the kernel nvme driv= er > unbind from it. That is absolutely true. However, I find it not to be a problem in practice= . Hypervisor products, specially those caring about performance, efficiency a= nd fairness, will dedicate NVMe devices for a particular purpose (eg. vDisk= storage, cache, metadata) and will not share these devices for other use c= ases. That's because these products want to deterministically control the p= erformance aspects of the device, which you just cannot do if you are shari= ng the device with a subsystem you do not control. For scenarios where the device must be shared and such fine grained control= is not required, it looks like using the kernel driver with io_uring offer= s very good performance with flexibility. Cheers, Felipe=