Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932448AbYFFUWU (ORCPT ); Fri, 6 Jun 2008 16:22:20 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1759405AbYFFUWH (ORCPT ); Fri, 6 Jun 2008 16:22:07 -0400 Received: from mtagate7.uk.ibm.com ([195.212.29.140]:29461 "EHLO mtagate7.uk.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755896AbYFFUWE (ORCPT ); Fri, 6 Jun 2008 16:22:04 -0400 Date: Fri, 6 Jun 2008 23:21:19 +0300 From: Muli Ben-Yehuda To: Grant Grundler Cc: FUJITA Tomonori , linux-kernel@vger.kernel.org, mgross@linux.intel.com, linux-scsi@vger.kernel.org Subject: Re: Intel IOMMU (and IOMMU for Virtualization) performances Message-ID: <20080606202119.GQ15085@il.ibm.com> References: <20080604235053K.fujita.tomonori@lab.ntt.co.jp> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.15+20070412 (2007-04-11) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3464 Lines: 77 On Wed, Jun 04, 2008 at 11:06:15AM -0700, Grant Grundler wrote: > On Wed, Jun 4, 2008 at 7:47 AM, FUJITA Tomonori > wrote: > ... > > Now I try to fix Intel IOMMU code, the free space management > > algorithm. > > > > The major difference between Intel IOMMU code and the others is Intel > > IOMMU code uses Red Black tree to manage free space while the others > > use bitmap (swiotlb is the only exception). > > > > The Red Black tree method consumes less memory than the bitmap method, > > but it incurs more overheads (the RB tree method needs to walk through > > the tree, allocates a new item, and insert it every time it maps an > > I/O address). Intel IOMMU (and IOMMUs for virtualization) needs > > multiple IOMMU address spaces. That's why the Red Black tree method is > > chosen, I guess. > > It's possible to split up one flat address space and share the IOMMU > among several users. Each user gets her own segment of bitmap and > corresponding IO Pdir. So I don't see allocation policy as a strong > reason to use Red/Black Tree. Do you mean multiple users sharing the same I/O address space (but each user using a different segment), or multiple users, each with its own I/O address space, but only using a specific segment of that address space and using a single bitmap to represent free space in all segments? If the former, then you are losing some of the benefit of the IOMMU since all users can DMA to other users areas (same I/O address space). If the latter, having a bitmap per IO address space seems simpler and would have the same memory consumption. > > I got the following results with one thread issuing 1KB I/Os: > > > > IOPS (I/O per second) > > IOMMU disabled 145253.1 (1.000) > > RB tree (mainline) 118313.0 (0.814) > > Bitmap 128954.1 (0.887) > > Just to make this clear, this is a 10% performance difference. > > But a second metric is more telling: CPU utilization. How much time > was spent in the IOMMU code for each implementation with the same > workload? > > This isn't a demand for that information but just a request to > measure that in any future benchmarking. oprofile or perfmon2 are > the best tools to determine that. Agreed, CPU utilization would be very interesting here. > Just as important as the allocation data structure is the allocation > policy. The allocation policy will perform best if it matches the > IO TLB replacement implemented in the IOMMU HW. Thrashing the IO TLB > by allocating aliases to competing streams will hurt perf as well. > Obviously a single benchmark is unlikely to detect this. Is there a public description of the caching policies of currently available VT-d hardware? > I've never been able to come up with a good heuristic for > determining the size of the IOVA space. It generally does NOT need > to map all of Host Physical RAM. The actual requirement depends > entirely on the workload, type and number of IO devices > installed. The problem is we don't know any of those things until > well after the IOMMU is already needed. Why not do what hash-tables implementation do, start small and resize when we approach half-full? Cheers, Muli -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/