Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1761983AbYFWRyd (ORCPT ); Mon, 23 Jun 2008 13:54:33 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1757035AbYFWRyF (ORCPT ); Mon, 23 Jun 2008 13:54:05 -0400 Received: from mga09.intel.com ([134.134.136.24]:19290 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753273AbYFWRyD (ORCPT ); Mon, 23 Jun 2008 13:54:03 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="4.27,691,1204531200"; d="scan'208";a="400996028" Date: Mon, 23 Jun 2008 10:54:01 -0700 From: mark gross To: FUJITA Tomonori Cc: linux-kernel@vger.kernel.org, linux-scsi@vger.kernel.org Subject: Re: Intel IOMMU (and IOMMU for Virtualization) performances Message-ID: <20080623175401.GA17008@linux.intel.com> Reply-To: mgross@linux.intel.com References: <20080604235053K.fujita.tomonori@lab.ntt.co.jp> <20080605220216.GA12927@linux.intel.com> <20080606134839Z.fujita.tomonori@lab.ntt.co.jp> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20080606134839Z.fujita.tomonori@lab.ntt.co.jp> User-Agent: Mutt/1.5.15+20070412 (2007-04-11) X-OriginalArrivalTime: 23 Jun 2008 17:54:01.0930 (UTC) FILETIME=[1EB96AA0:01C8D55A] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4698 Lines: 108 On Fri, Jun 06, 2008 at 01:44:30PM +0900, FUJITA Tomonori wrote: > On Thu, 5 Jun 2008 15:02:16 -0700 > mark gross wrote: > > > On Wed, Jun 04, 2008 at 11:47:01PM +0900, FUJITA Tomonori wrote: > > > I resumed the work to make the IOMMU respect drivers' DMA alignment > > > (since I got a desktop box having VT-d). In short, some IOMMUs > > > allocate memory areas spanning driver's segment boundary limit (DMA > > > alignment). It forces drivers to have a workaround to split up scatter > > > entries into smaller chunks again. To remove such work around in > > > drivers, I modified several IOMMUs, X86_64 (Calgary and Gart), Alpha, > > > POWER, PARISC, IA64, SPARC64, and swiotlb. > > > > > > Now I try to fix Intel IOMMU code, the free space management > > > algorithm. > > > > > > The major difference between Intel IOMMU code and the others is Intel > > > IOMMU code uses Red Black tree to manage free space while the others > > > use bitmap (swiotlb is the only exception). > > > > > > The Red Black tree method consumes less memory than the bitmap method, > > > but it incurs more overheads (the RB tree method needs to walk through > > > the tree, allocates a new item, and insert it every time it maps an > > > I/O address). Intel IOMMU (and IOMMUs for virtualization) needs > > > multiple IOMMU address spaces. That's why the Red Black tree method is > > > chosen, I guess. > > > > > > Half a year ago, I tried to convert POWER IOMMU code to use the Red > > > Black method and saw performance drop: > > > > > > http://linux.derkeiler.com/Mailing-Lists/Kernel/2007-11/msg00650.html > > > > > > So I tried to convert Intel IOMMU code to use the bitmap method to see > > > how much I can get. > > > > > > I didn't see noticable performance differences with 1GbE. So I tried > > > the modified driver of a SCSI HBA that just does memory accesses to > > > emulate the performances of SSD disk drives, 10GbE, Infiniband, etc. > > > > > > I got the following results with one thread issuing 1KB I/Os: > > > > > > IOPS (I/O per second) > > > IOMMU disabled 145253.1 (1.000) > > > RB tree (mainline) 118313.0 (0.814) > > > Bitmap 128954.1 (0.887) > > > > > > > FWIW: You'll see bigger deltas if you boot with intel_iommu=strict, but > > those will be because of waiting on IOMMU hardware to flush caches and > > may further hide effects of gong with a bitmap as opposed to a RB tree. > > Yeah, I know. I'll test 'intel_iommu=strict' option next time. > > The patch also has 'intel_iommu=strict' option. Wiht it enabled, it > flushes TLB cache every time dma_unmap_* is called as the original > code does. > > > > > I've attached the patch to modify Intel IOMMU code to use the bitmap > > > method but I have no intention of arguing that Intel IOMMU code > > > consumes more memory for better performance. :) I want to do more > > > performance tests with 10GbE (probably, I have to wait for a server > > > box having VT-d, which is not available on the market now). > > > > > > As I said, what I want to do now is to make Intel IOMMU code respect > > > drivers' DMA alignment. Well, it's easier to do that if Intel IOMMU > > > uses the bitmap method since I can simply convert the IOMMU code to > > > use lib/iommu-helper but I can modify the RB tree method too. > > > > > > > I'm going to be out of contact for a few weeks but this work sounds > > interesting. > > Why did you choose the RB tree instead of a traditional bitmap scheme > to manage free space? > I inherited this code. And I'm passing it on to David Woodhouse soon. I don't know why RB was used over BM. I guess for scalability to many 10Gig IO devices, but thats just a guess. > > > > I'm just interested in other people's opinions on IOMMU > > > implementations, performances, possible future changes for performance > > > improvement, etc. > > > > > > For further information: > > > > > > LSF'08 "Storage Track" summary by Grant Grundler: > > > http://iou.parisc-linux.org/lsf2008/SUMMARY-Storage.txt > > > > > > My LSF'08 slides: > > > http://iou.parisc-linux.org/lsf2008/IO-DMA_Representations-fujita_tomonori.pdf > > > > > > > > > Tis patch is against the latst git tree (note that it just converts > > > Intel IOMMU code to use the bitmap. It doesn't make it respect > > > drivers' DMA alignment yet). > > > > > > > I'll look closely at your patch later. > > Thanks a lot! -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/