Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1762920AbYFETBn (ORCPT ); Thu, 5 Jun 2008 15:01:43 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1754644AbYFETBe (ORCPT ); Thu, 5 Jun 2008 15:01:34 -0400 Received: from accolon.hansenpartnership.com ([76.243.235.52]:40029 "EHLO accolon.hansenpartnership.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754613AbYFETBc (ORCPT ); Thu, 5 Jun 2008 15:01:32 -0400 Subject: Re: Intel IOMMU (and IOMMU for Virtualization) performances From: James Bottomley To: Grant Grundler Cc: FUJITA Tomonori , linux-kernel@vger.kernel.org, mgross@linux.intel.com, linux-scsi@vger.kernel.org In-Reply-To: References: <20080604235053K.fujita.tomonori@lab.ntt.co.jp> <20080605235322L.fujita.tomonori@lab.ntt.co.jp> Content-Type: text/plain Date: Thu, 05 Jun 2008 14:01:28 -0500 Message-Id: <1212692488.4241.8.camel@localhost.localdomain> Mime-Version: 1.0 X-Mailer: Evolution 2.22.1 (2.22.1-2.fc9) Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2853 Lines: 60 On Thu, 2008-06-05 at 11:34 -0700, Grant Grundler wrote: > On Thu, Jun 5, 2008 at 7:49 AM, FUJITA Tomonori > wrote: > ... > >> You can easily emulate SSD drives by doing sequential 4K reads > >> from a normal SATA HD. That should result in ~7-8K IOPS since the disk > >> will recognize the sequential stream and read ahead. SAS/SCSI/FC will > >> probably work the same way with different IOP rates. > > > > Yeah, probabaly right. I thought that 10GbE give the IOMMU more > > workloads than SSD does and tried to emulate something like that. > > 10GbE might exercise a different code path. NICs typically use map_single map_page, actually, but effectively the same thing. However, all they're really doing is their own implementation of sg list mapping. > and storage devices typically use map_sg. But they both exercise the same > underlying resource management code since it's the same IOMMU they poke at. > > ... > >> Sorry, I didn't see a replacement for the deferred_flush_tables. > >> Mark Gross and I agree this substantially helps with unmap performance. > >> See http://lkml.org/lkml/2008/3/3/373 > > > > Yeah, I can add a nice trick in parisc sba_iommu uses. I'll try next > > time. > > > > But it probably gives the bitmap method less gain than the RB tree > > since clear the bitmap takes less time than changing the tree. > > > > The deferred_flush_tables also batches flushing TLB. The patch flushes > > TLB only when it reaches the end of the bitmap (it's a trick that some > > IOMMUs like SPARC does). > > The batching of the TLB flushes is the key thing. I was being paranoid > by not marking the resource free until after the TLB was flushed. If we > know the allocation is going to be circular through the bitmap, flushing > the TLB once per iteration through the bitmap should be sufficient since > we can guarantee the IO Pdir resource won't get re-used until a full > cycle through the bitmap has been completed. Not necessarily ... there's a safety vs performance issue here. As long as the iotlb mapping persists, the device can use it to write to the memory. If you fail to flush, you lose the ability to detect device dma after free (because the iotlb may still be valid). On standard systems, this happens so infrequently as to be worth the tradeoff. However, in virtualised systems, which is what the intel iommu is aimed at, stale iotlb entries can be used by malicious VMs to gain access to memory outside of their VM, so the intel people at least need to say whether they're willing to accept this speed for safety tradeoff. James -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/