Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753661Ab1DABH0 (ORCPT ); Thu, 31 Mar 2011 21:07:26 -0400 Received: from relay3.sgi.com ([192.48.152.1]:55888 "EHLO relay.sgi.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751746Ab1DABHZ (ORCPT ); Thu, 31 Mar 2011 21:07:25 -0400 Message-ID: <4D9524DF.10405@sgi.com> Date: Thu, 31 Mar 2011 20:05:35 -0500 From: Mike Habeck User-Agent: Thunderbird 2.0.0.9 (X11/20071115) MIME-Version: 1.0 To: Chris Wright Cc: Mike Travis , David Woodhouse , Jesse Barnes , iommu@lists.linux-foundation.org, linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH 3/4] Intel pci: Limit dmar_init_reserved_ranges References: <20110329233602.272459647@gulag1.americas.sgi.com> <20110329233602.735667875@gulag1.americas.sgi.com> <4D94FC21.6040601@sgi.com> <20110331225316.GC18712@sequoia.sous-sol.org> <4D950D52.8080100@sgi.com> <4D951109.1040707@sgi.com> <20110331235657.GG18712@sequoia.sous-sol.org> In-Reply-To: <20110331235657.GG18712@sequoia.sous-sol.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4851 Lines: 108 Chris Wright wrote: > * Mike Habeck (habeck@sgi.com) wrote: >> On 03/31/2011 06:25 PM, Mike Travis wrote: >>> I'll probably need help from our Hardware PCI Engineer to help explain >>> this further, though here's a pointer to an earlier email thread: >>> >>> http://marc.info/?l=linux-kernel&m=129259816925973&w=2 >>> >>> I'll also dig out the specs you're asking for. >>> >>> Thanks, >>> Mike >>> >>> Chris Wright wrote: >>>> * Mike Travis (travis@sgi.com) wrote: >>>>> Chris - did you have any comment on this patch? >>>> It doesn't actually look right to me. It means that particular range >>>> is no longer reserved. But perhaps I've misunderstood something. >>>> >>>>> Mike Travis wrote: >>>>>> dmar_init_reserved_ranges() reserves the card's MMIO ranges to >>>>>> prevent handing out a DMA map that would overlap with the MMIO range. >>>>>> The problem while the Nvidia GPU has 64bit BARs, it's capable of >>>>>> receiving > 40bit PIOs, but can't generate > 40bit DMAs. >>>> I don't undertand what you mean here. >> What Mike is getting at is there is no reason to reserve the MMIO >> range if it's greater than the dma_mask, given the MMIO range is >> outside of what the IOVA code will ever hand back to the IOMMU >> code. In this case the nVidia card has a 64bit BAR and is assigned >> the MMIO range [0xf8200000000 - 0xf820fffffff]. But the Nvidia >> card can only generate a 40bit DMA (thus has a 40bit dma_mask). If >> the IOVA code honors the limit_pfn (i.e., dma_mask) passed in it >> will never hand back a >40bit address back to the IOMMU code. Thus >> there is no reason to reserve the cards MMIO range if it is greater >> than the dma_mask. (And that is what the patch is doing). > > The reserved ranges are for all devices. Another device with a 64bit > dma_mask could get that region if it's not properly reserved. The > driver would then program that device to dma to an address to is an > alias to a MMIO region. The memory transaction travels up towards > root...and sees the MMIO range in some bridge and would go straight down > to the GPU. Chris, OK, I understand now what you meant by the patch possibly causing the DMA transaction to become a peer to peer transaction. Mike and I will have to rethink this one. Thanks for your input. -mike > >> More below,,, >> >>>>>> So when the iommu code reserves these MMIO ranges a > 40bit >>>>>> entry ends up getting in the rbtree. On a UV test system with >>>>>> the Nvidia cards, the BARs are: >>>>>> >>>>>> 0001:36:00.0 VGA compatible controller: nVidia Corporation >>>>>> GT200GL Region 0: Memory at 92000000 (32-bit, non-prefetchable) >>>>>> [size=16M] >>>>>> Region 1: Memory at f8200000000 (64-bit, prefetchable) [size=256M] >>>>>> Region 3: Memory at 90000000 (64-bit, non-prefetchable) [size=32M] >>>>>> >>>>>> So this 44bit MMIO address 0xf8200000000 ends up in the rbtree. As DMA >>>>>> maps get added and deleted from the rbtree we can end up getting a cached >>>>>> entry to this 0xf8200000000 entry... this is what results in the code >>>>>> handing out the invalid DMA map of 0xf81fffff000: >>>>>> >>>>>> [ 0xf8200000000-1 >> PAGE_SIZE << PAGE_SIZE ] >>>>>> >>>>>> The IOVA code needs to better honor the "limit_pfn" when allocating >>>>>> these maps. >>>> This means we could get the MMIO address range (it's no longer reserved). >> Not true, the MMIO address is greater than the dma_mask (i.e., the >> limit_pfn passed into alloc_iova()) thus the IOVA code will never >> hand back that address range given it's greater than the dma_mask). > > Well, as you guys are seeing, the iova allocation code is making the > assumption that if the range is in the tree, it's valid. And it is > handing out an address that's too large. > >>>> It seems to me the DMA transaction would then become a peer to peer >>>> transaction if ACS is not enabled, which could show up as random register >>>> write in that GPUs 256M BAR (i.e. broken). >>>> >>>> The iova allocation should not hand out an address bigger than the >>>> dma_mask. What is the device's dma_mask? >> Agree. But there is a bug. The IOVA doesn't validate the limit_pfn >> if it uses the cached entry. One could argue that it should validate >> the limit_pfn, but then again a entry outside the limit_pfn should >> have never got into the rbtree... (it got in due to the IOMMU's >> dmar_init_reserved_ranges() adding it). > > Yeah, I think it needs to be in the global reserved list. But perhaps > not copied into the domain specific iova. Or simply skipped on iova > allocation (don't just assume rb_last is <= dma_mask). > > thanks, > -chris -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/