Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp1080237rwd; Thu, 8 Jun 2023 11:47:14 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ5t1DBkHEJ9w3p6AJA4k2rKkc8F7li2DYgt5QhFNC3w7HE5UQKoN/mOhnvt5UY+vPBXhJby X-Received: by 2002:a17:903:32c9:b0:1b2:a63:9587 with SMTP id i9-20020a17090332c900b001b20a639587mr5906412plr.36.1686250033783; Thu, 08 Jun 2023 11:47:13 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1686250033; cv=none; d=google.com; s=arc-20160816; b=jwebpKI1wIvWnKvD+pHRjjbGNJbgZTXeIsRbPOi4/2MUR8LidWi9iygJkVflrnaJk1 lhKJBilu6CBl2DDarsbrDYYHKZ+QyF8SKGA3h/kSPOKb6W+Irwywa5hOjvJrx+3W2Vzu q+sRvFqw9JHOoDhlAvWX+WFm6qsfYWTZfxddDhPWLGT5xoZGoT9U7pyXQTwe44DqH8lw twSZ8rXXPqf1HxdyJgeMIdjHZglCMHkbxldUgCC4b7uTNt7LUPAnKrOhEGz6jlNUXrQh I/5Mft9+sS17ITEyGfsJR1mP5DRKrj8ANI1kmDQp7W2RlxYRRgI4wJjBcmWKn44d9Yte tbgQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=4dZE6f/Fvt9bX12/umBKYpGotal25L2oiGZhvdB1lQs=; b=IP5BqXqkj/+3UCgRBrUJXtYSLL25dem/uqAT9S7ZepR/sjujxR8d7pP7lqOacW3gMv 5jM+16kWjI4lCggCKBOEOVGZkzhHWrYlGTXkIa769917Zgj8qU74cw4PqhsAv9jAZIRF ccW5sz+a7eyaMs5tc6SY3ACY3rCY3c+9GKJ2IIUmxrIMT2vocixn1CPYDoHopyhss2wW 81Dc+fHz6SQOkANe2uW4BhW8GWDvJvdb47RDHpK1gDhrYj+ZlbIWIeALngvyIAOy8m9g pBozD5HI0PzDIBSwlabekI4Hd1vVjaPuM9ANGaOnNsgKFeYUxJEUgcESfrPfKZ4yGZSb 4FjA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20221208 header.b=LnNOYAta; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id d17-20020a170903231100b001b2483a46cdsi1380558plh.480.2023.06.08.11.46.59; Thu, 08 Jun 2023 11:47:13 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20221208 header.b=LnNOYAta; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235457AbjFHSQd (ORCPT + 99 others); Thu, 8 Jun 2023 14:16:33 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42052 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233833AbjFHSQc (ORCPT ); Thu, 8 Jun 2023 14:16:32 -0400 Received: from mail-io1-xd36.google.com (mail-io1-xd36.google.com [IPv6:2607:f8b0:4864:20::d36]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id ABFBD2709; Thu, 8 Jun 2023 11:16:29 -0700 (PDT) Received: by mail-io1-xd36.google.com with SMTP id ca18e2360f4ac-77797beb42dso45033939f.2; Thu, 08 Jun 2023 11:16:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1686248189; x=1688840189; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=4dZE6f/Fvt9bX12/umBKYpGotal25L2oiGZhvdB1lQs=; b=LnNOYAtatL7DrkJpBJaaoIPyjspIfTltSQluWMWnqkoo662zXT7xiFL/59r/AkOJ0a +1twxNPMhgRVcs2gBd1LMu1OyEFz5uxmeJHW1Dibf5A6/TtwQI7YGL790R8klLGgFVB7 VC7bPndag6EC0NK2uyO+jcvF8Q7IBwFCMQdwrRMggH3EoksMUkoqPLHW/32AyNdvnHYW 6OAWiql6VK2miAb9+yCKT4yw/kn/TpYTeVBwSgyIDhjo4DgAt1vj5co3H2Ud5btwAozX RGNSRvGO1E7J0p5/7+/Z6rMb0H+r6tmmVmmBeMvK3T7Ntd2jof0y8EplAoUewm99hx0x 1VzQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686248189; x=1688840189; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=4dZE6f/Fvt9bX12/umBKYpGotal25L2oiGZhvdB1lQs=; b=YLCbAeRHVeeJclvzR/RujkDtSmgmMAC5t5wZpSeKEl9tKczaGi1v67xm89WAdAd0Pl tPzr8h5sAHm+SfuCyoTnaK7BiUXAeNkZTQrERNHYSiJ4Vu3KqbArnDS+UqkqgU9KDE5/ 0n7kYdoNjItefgzAfEtG51AHkPc1DcA1VTRfuQrCiBOxRxunneoPA/JWWqu8KjjaT1GW Qrx+JzdpIog5V2c68Vkr72cUAanHUizXf0UPDfIO/Jmwd8JIgIT4NkPbbFr9kOeq2p64 uq1L5eAvQo119+jgtfvPBR1DHPYAKqfKsC99CQZTiSOTvhAeAxlpGdkZFiRdpg6AEFWS w9SA== X-Gm-Message-State: AC+VfDzUbGp7oMl3DSHwJDwAm0URVbp7/GMqMDX7bE55dcVDk4NW+p2N ZlgcLxavIjmIzuHDdjq+gP5goQ6Iqyv4CPX4ums= X-Received: by 2002:a92:d9cf:0:b0:32b:1536:f3e9 with SMTP id n15-20020a92d9cf000000b0032b1536f3e9mr10424449ilq.18.1686248188772; Thu, 08 Jun 2023 11:16:28 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Alexander Duyck Date: Thu, 8 Jun 2023 11:15:52 -0700 Message-ID: Subject: Re: Question about reserved_regions w/ Intel IOMMU To: Ashok Raj Cc: Ashok Raj , Baolu Lu , LKML , linux-pci , iommu@lists.linux.dev Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jun 8, 2023 at 10:52=E2=80=AFAM Ashok Raj wro= te: > > On Thu, Jun 08, 2023 at 10:10:54AM -0700, Alexander Duyck wrote: > > On Thu, Jun 8, 2023 at 8:40=E2=80=AFAM Ashok Raj wrote: > > > > > > On Thu, Jun 08, 2023 at 07:33:31AM -0700, Alexander Duyck wrote: > > > > On Wed, Jun 7, 2023 at 8:05=E2=80=AFPM Baolu Lu wrote: > > > > > > > > > > On 6/8/23 7:03 AM, Alexander Duyck wrote: > > > > > > On Wed, Jun 7, 2023 at 3:40=E2=80=AFPM Alexander Duyck > > > > > > wrote: > > > > > >> > > > > > >> I am running into a DMA issue that appears to be a conflict be= tween > > > > > >> ACS and IOMMU. As per the documentation I can find, the IOMMU = is > > > > > >> supposed to create reserved regions for MSI and the memory win= dow > > > > > >> behind the root port. However looking at reserved_regions I am= not > > > > > >> seeing that. I only see the reservation for the MSI. > > > > > >> > > > > > >> So for example with an enabled NIC and iommu enabled w/o passt= hru I am seeing: > > > > > >> # cat /sys/bus/pci/devices/0000\:83\:00.0/iommu_group/reserved= _regions > > > > > >> 0x00000000fee00000 0x00000000feefffff msi > > > > > >> > > > > > >> Shouldn't there also be a memory window for the region behind = the root > > > > > >> port to prevent any possible peer-to-peer access? > > > > > > > > > > > > Since the iommu portion of the email bounced I figured I would = fix > > > > > > that and provide some additional info. > > > > > > > > > > > > I added some instrumentation to the kernel to dump the resource= s found > > > > > > in iova_reserve_pci_windows. From what I can tell it is finding= the > > > > > > correct resources for the Memory and Prefetchable regions behin= d the > > > > > > root port. It seems to be calling reserve_iova which is success= fully > > > > > > allocating an iova to reserve the region. > > > > > > > > > > > > However still no luck on why it isn't showing up in reserved_re= gions. > > > > > > > > > > Perhaps I can ask the opposite question, why it should show up in > > > > > reserve_regions? Why does the iommu subsystem block any possible = peer- > > > > > to-peer DMA access? Isn't that a decision of the device driver. > > > > > > > > > > The iova_reserve_pci_windows() you've seen is for kernel DMA inte= rfaces > > > > > which is not related to peer-to-peer accesses. > > > > > > > > The problem is if the IOVA overlaps with the physical addresses of > > > > other devices that can be routed to via ACS redirect. As such if AC= S > > > > redirect is enabled a host IOVA could be directed to another device= on > > > > the switch instead. To prevent that we need to reserve those addres= ses > > > > to avoid address space collisions. > > > > Our test case is just to perform DMA to/from the host on one device on > > a switch and what we are seeing is that when we hit an IOVA that > > matches up with the physical address of the neighboring devices BAR0 > > then we are seeing an AER followed by a hot reset. > > ACS is always confusing.. Does your NIC have a DTLB? No. It is using the IOMMU for all address translation. I am also pushing back on the test being used as well. It is always possible they have implemented something incorrectly and are overrunning a buffer going into the reserved IOVA region and the overlap is just a coincidence. > If request redirect is set, and the Egress is enabled, then all > transactions should go upstream to the root-port->IOMMU before being > served. > > In my 6.0 spec its in 6.12.3 ACS Peer-to-Peer Control Interactions? > > And maybe lspci would show how things are setup in the switch? We were setting the Redirect Request only, no Egress. I agree, based on the config everything should just go upstream. However if we eliminate the switch or put things in passthrough mode the problem goes away. > > > > > Any untranslated address from a device must be forwarded to the IOMMU= when > > > ACS is enabled correct?I guess if you want true p2p, then you would n= eed > > > to map so that the hpa turns into the peer address.. but its always a= round > > > trip to IOMMU. > > > > This assumes all parts are doing the Request Redirect "correctly". In > > our case there is a PCIe switch we are trying to debug and we have a > > few working theories. One concern I have is that the switch may be > > throwing an ACS violation for us using an address that matches a > > neighboring device instead of redirecting it to the upstream port. If > > we pull the switch and just run on the root complex the issue seems to > > be resolved so I started poking into the code which led me to the > > documentation pointing out what is supposed to be reserved based on > > the root complex and MSI regions. > > > > As a part of going down that rabbit hole I realized that the > > reserved_regions seems to only list the MSI reservation. However after > > digging a bit deeper it seems like there is code to reserve the memory > > behind the root complex in the IOVA but it doesn't look like that is > > visible anywhere and is the piece I am currently trying to sort out. > > What I am working on is trying to figure out if the system that is > > failing is actually reserving that memory region in the IOVA, or if > > that is somehow not happening in our test setup. > > I suspect with IOMMU, there is no need to pluck holes like we do for the > MSI. In very early code in IOMMU i vaguely recall we did that, but our > knowledge on ACS was weak. (not that has improved :-)). The hole has to do mostly with avoiding any possibility of misrouting things, or at least that was my understanding after reading it. > Knowing how the switch and root ports are setup with forwarding may help > with some clues. The easy option is maybe forcibly adding to the reserve= d > range may help to see if you don't see the ACS violation. > > Baolu might have some better ideas. I'm working with the team having the issue to try and verify that now. In theory it should already be reserved so I am working with them to check that. Thanks, - Alex