Received: by 2002:ac0:bc90:0:0:0:0:0 with SMTP id a16csp3516724img; Mon, 25 Mar 2019 11:54:17 -0700 (PDT) X-Google-Smtp-Source: APXvYqz0KmcSCFiLlrm6hL78MVHcXiONOwVL9BIjOWGtJodYPH8DA0cDPrlufFGBTQ861PAKuoHi X-Received: by 2002:aa7:85cc:: with SMTP id z12mr25498027pfn.142.1553540057228; Mon, 25 Mar 2019 11:54:17 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1553540057; cv=none; d=google.com; s=arc-20160816; b=BF3W8jDPjfGNzG2rPM95KRTE9hILokXdtzfdLSMix6SHGW10s888ZW06m864GQKh+W NSI7BvjY+C5fqIZpbMISQGP1dK4FQk34+ZFfHksFRmpqNOgT4YvKZfb8QGS2U4JjImcj 14xLy3HDCEN2Nlp0zpP5UvNSfBqiaVTzwCoLKn1EBqxyBXB0sGrR3EAasLd2JvCM/sg7 nocihp+p5zUsB9dfEEJ2L9ceelP7S3n/TPR75yVixU6wkq8cD6F4vOCGwx4OAGODdID7 D90VA5P5Ea79XusPpe+aE1ElvbKnJ+hWkujIO29bW1mu3viwYJI1/Bya1idie9Toqp6h gNig== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:date:cc:to:from:subject:message-id; bh=tRtNSKwZWVkl79eBj+RI+AwhLD86aOveaXvKfclM2pU=; b=V+1wpLcudUIW/+S6mOo5j5T/fa5ayNwDMK9pXMnHgSsnZ/jTSpo6iTIyCAHZScRTln ZTRSgntIKKx4RC0L3mGs/DaMW1iYtnvgQi7AsJ7OmORqQ26QpGqhQWvxu79rRpGTGuKm iRhZ1wymKOBtq/38yX+nqYAWcKEva/u8vSIovSYMJ54jTTBleYJ9TrBDEBuP6g9aL/KX JMmH409cZs3pFGwL0Rd34RdJwu2h7Gr06Fd4Te5wnW4y2/+p/9HBsdUAg9W4oj3CpBte uh1Aqwv4FcjMLPs+mxajKprVaGy5nr+V2EKq/j90cNfEDlb3lo7WwwCz1pFYwbuTxe8M /Ceg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id x6si13600935pfm.219.2019.03.25.11.54.02; Mon, 25 Mar 2019 11:54:17 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730011AbfCYSwl (ORCPT + 99 others); Mon, 25 Mar 2019 14:52:41 -0400 Received: from mx1.redhat.com ([209.132.183.28]:44098 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728912AbfCYSwl (ORCPT ); Mon, 25 Mar 2019 14:52:41 -0400 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 649C2AB965; Mon, 25 Mar 2019 18:52:40 +0000 (UTC) Received: from maximlenovopc.usersys.redhat.com (unknown [10.35.206.72]) by smtp.corp.redhat.com (Postfix) with ESMTP id 4E8115DC1B; Mon, 25 Mar 2019 18:52:33 +0000 (UTC) Message-ID: Subject: Re: [PATCH 0/9] RFC: NVME VFIO mediated device [BENCHMARKS] From: Maxim Levitsky To: linux-nvme@lists.infradead.org Cc: Fam Zheng , Keith Busch , Sagi Grimberg , kvm@vger.kernel.org, Wolfram Sang , Greg Kroah-Hartman , Liang Cunming , Nicolas Ferre , linux-kernel@vger.kernel.org, Kirti Wankhede , "David S . Miller" , Jens Axboe , Alex Williamson , John Ferlan , Mauro Carvalho Chehab , Paolo Bonzini , Liu Changpeng , "Paul E . McKenney" , Amnon Ilan , Christoph Hellwig Date: Mon, 25 Mar 2019 20:52:32 +0200 In-Reply-To: References: <20190319144116.400-1-mlevitsk@redhat.com> Content-Type: text/plain; charset="UTF-8" Mime-Version: 1.0 Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.26]); Mon, 25 Mar 2019 18:52:40 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi This is first round of benchmarks. The system is Intel(R) Xeon(R) Gold 6128 CPU @ 3.40GHz The system has 2 numa nodes, but only cpus and memory from node 0 were used to avoid noise from numa. The SSD is IntelĀ® Optaneā„¢ SSD 900P Series, 280 GB version https://ark.intel.com/content/www/us/en/ark/products/123628/intel-optane-ssd-900p-series-280gb-1-2-height-pcie-x4-20nm-3d-xpoint.html ** Latency benchmark with no interrupts at all ** spdk was complited with fio plugin in the host and in the guest. spdk was first run in the host then vm was started with one of spdk,pci passthrough, mdev and inside the vm spdk was run with fio plugin. spdk was taken from my branch on gitlab, and fio was complied from source for 3.4 branch as needed by the spdk fio plugin. The following spdk command line was used: $WORK/fio/fio \ --name=job --runtime=40 --ramp_time=0 --time_based \ --filename="trtype=PCIe traddr=$DEVICE_FOR_FIO ns=1" --ioengine=spdk \ --direct=1 --rw=randread --bs=4K --cpus_allowed=0 \ --iodepth=1 --thread The average values for slat (submission latency), clat (completion latency) and its sum (slat+clat) were noted. The results: spdk fio host: 573 Mib/s - slat 112.00ns, clat 6.400us, lat 6.52ms 573 Mib/s - slat 111.50ns, clat 6.406us, lat 6.52ms pci passthough host/ spdk fio guest 571 Mib/s - slat 124.56ns, clat 6.422us lat 6.55ms 571 Mib/s - slat 122.86ns, clat 6.410us lat 6.53ms 570 Mib/s - slat 124.95ns, clat 6.425us lat 6.55ms spdk host/ spdk fio guest: 535 Mib/s - slat 125.00ns, clat 6.895us lat 7.02ms 534 Mib/s - slat 125.36ns, clat 6.896us lat 7.02ms 534 Mib/s - slat 125.82ns, clat 6.892us lat 7.02ms mdev host/ spdk fio guest: 534 Mib/s - slat 128.04ns, clat 6.902us lat 7.03ms 535 Mib/s - slat 126.97ns, clat 6.900us lat 7.03ms 535 Mib/s - slat 127.00ns, clat 6.898us lat 7.03ms As you see, native latency is 6.52ms, pci passthrough barely adds any latency, while both mdev/spdk added about (7.03/2 - 6.52) - 0.51ms/0.50ms of latency. In addtion to that I added few 'rdtsc' into my mdev driver to strategically capture the cycle count it takes it to do 3 things: 1. translate a just received command (till it is copied to the hardware submission queue) 2. receive a completion (divided by the number of completion received in one round of polling) 3. deliver an interupt to the guest (call to eventfd_signal) This is not the whole latency as there is also a latency between the point the submission entry is written and till it is visible on the polling cpu, plus latency till polling cpu gets to the code which reads the submission entry, and of course latency of interrupt delivery, but the above measurements mostly capture the latency I can control. The results are: commands translated : avg cycles: 459.844 avg time(usec): 0.135 commands completed : avg cycles: 354.61 avg time(usec): 0.104 interrupts sent : avg cycles: 590.227 avg time(usec): 0.174 avg time total: 0.413 usec All measurmenets done in the host kernel. the time calculated using tsc_khz kernel variable. The biggest take from this is that both spdk and my driver are very fast and overhead is just a thousand of cpu cycles give it or take. *** Throughput benchmarks *** https://paste.fedoraproject.org/paste/ecijclLMG2B11MVCVIst-w Here you can find the throughput benchmarks. The biggest take is that when using no interrupts (spdk fio in guest or spdk fio in host), the bottelneck is in the device, and througput is about 2290 Mib/s And mdev vs spdk, with interrupts, my driver sligly wins by giving throughput of about 2015 Mib/s while spdk is about 2005 Mib/s mostly due to slightly different timings as the latency of both is about the same. Disabling meltdown mitigation didn't had much effect on the performance. Best regards, Maxim Levitsky