Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp996840rwd; Thu, 8 Jun 2023 10:28:07 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4/A51rzuEvVbmLgakQTIRT2uwg8RJFnRtx0/4uoTIIwaMFUXhy60Hbyl1i2vxjPla5onL2 X-Received: by 2002:a17:90a:ea05:b0:259:5495:8e03 with SMTP id w5-20020a17090aea0500b0025954958e03mr3903190pjy.48.1686245287202; Thu, 08 Jun 2023 10:28:07 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1686245287; cv=none; d=google.com; s=arc-20160816; b=EBF8Q+/GeG/zwd+CE1KS8P7NWtRHm+W93NpHu5BDLTLgaB0tJZV0j9aF3wx7en25Et 2nZS5QYZQrXYnFl2e86OrYEv6PK6FFJJZQ0Vbv4x49WEtq5jtBJ1J05CFJjdd3FaMUb0 HhPoDXcksTlgfPvtGUMVpGuXwc6B1eP9vf7MsLuSEc97pkGlGwWFLPDTa2IiciX/ZjxU BwrBre1TCTJ1G6ycGBOemI6hX449Tv5XK24boPwgiXEXWFki7A9P5dUvdUTsVMHuqnWF XXBREqCj5+BGSgGSxkYc71wnNx4UwZHIfXzbj8nCm2RN5tlXqtnA2pqLgdcL0j7gLqiq gQqg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=X0Xr4K9cnwEpUv2AEzAchReOMWS0DSH3sD0+mXFGpWo=; b=UB6Xsq8kejhbic8tmNs6Vyyc3ZaJaERd5I39VbRNgLjpFvba8HnElHwqyHxW34zmiJ 0uQsBaOkcvrY9M7Pp734p0/n2BwxPx0COkNu8wEj9E8Cnosy0tijAnkOupdgoRHgDb91 mPZQZvgivALFLv6tRhpqJWXaB91OhnflCjcdsGy9dJiYq8XtbyX47KK10cvGqlUNAiRG RnDZRMmOceqXIbjgUxEb62fY/oRAhsMLbcpJGKY9JNss7mWvwX5d/lyysjrMIzKnyPNf DC02Hsc8sgpsbl3k++czV6BEUhhpxrzah2z5acjM8dLJCOL0/gkf4h4oQiTsITyzNh4c UpmQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20221208 header.b=RTmafsJX; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id f14-20020a17090a9b0e00b00256a04ff7cbsi1327478pjp.119.2023.06.08.10.27.55; Thu, 08 Jun 2023 10:28:07 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20221208 header.b=RTmafsJX; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236658AbjFHRLg (ORCPT + 99 others); Thu, 8 Jun 2023 13:11:36 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39582 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229889AbjFHRLe (ORCPT ); Thu, 8 Jun 2023 13:11:34 -0400 Received: from mail-pj1-x102f.google.com (mail-pj1-x102f.google.com [IPv6:2607:f8b0:4864:20::102f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8A4862136; Thu, 8 Jun 2023 10:11:33 -0700 (PDT) Received: by mail-pj1-x102f.google.com with SMTP id 98e67ed59e1d1-256422ad25dso496623a91.0; Thu, 08 Jun 2023 10:11:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1686244292; x=1688836292; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=X0Xr4K9cnwEpUv2AEzAchReOMWS0DSH3sD0+mXFGpWo=; b=RTmafsJXiePGF8irS2bY69S+2DNvF7JkfDRkolowy2mCdE04SytlAbWK6TFybEi5u1 7yrCDn711JWYYfrjeb5DKoiOFsw74XM9+jCGBupXNL5JKYfdWCuYUfL8fJNqMZB63Grh luI/ZWF9Tz8lT5rOzyW3HzwDFiu/M5GFY/KR4VguUrwB8h/ileAoXKoG5CqDjtKDekKZ J9kzFhUogqtKYYGM4qMG2+wZHbnPzG91KAMuTqqQC6wqsN80GlilfUrmn4AI1i2W3Uem dXo3LJk3bzJjDLtlGTeyQKlOLTKJuhhYxtISKDWHRTxlWg3Ny379NQbbBArDJ90RfrVY 2roA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686244292; x=1688836292; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=X0Xr4K9cnwEpUv2AEzAchReOMWS0DSH3sD0+mXFGpWo=; b=X0pmfTahP5zi1avzeBPyVEVJrjOfm/5xz/H4ksrh4j1qKJjLuH4s/5ov85ci4xJSdu DREg8pPoyG0NmSaiXd4M5jVIg0js4/krotyalD+Jdi5TXMDkU2Ch8EPdX5INWwU26vE5 Q/rfR97ZhFAukgW+VaBbyBIxYliuiItb7dFqW8aNHbW2KaZLDExEC5yL6eDu46aH014U RObEuS9SRJFQ7A/JKcmDJq7OdyN6B7ory1Bs6iYn8yX24sJ7lQri7Nt2ZzgygDmeCJXR l0l1XvMKD8xdapGGYPLKQJUSWruHTs0TG9E1GwY4XZEgvqYqJGQvvIJueKPqLdcIxJYP dXqg== X-Gm-Message-State: AC+VfDyoVpKl1MiZ+7hk9tNA2Aa2gX0ynvkoQAf66dCd9HveRtNI01qq thItNL6xyrUbHQjVXhVEZDt58EgW5HYRpiJHjTSsbxQQzDo= X-Received: by 2002:a17:90b:811:b0:259:3f3c:99cc with SMTP id bk17-20020a17090b081100b002593f3c99ccmr4366962pjb.21.1686244291715; Thu, 08 Jun 2023 10:11:31 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Alexander Duyck Date: Thu, 8 Jun 2023 10:10:54 -0700 Message-ID: Subject: Re: Question about reserved_regions w/ Intel IOMMU To: Ashok Raj Cc: Baolu Lu , LKML , linux-pci , iommu@lists.linux.dev, Ashok Raj Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jun 8, 2023 at 8:40=E2=80=AFAM Ashok Raj wrote: > > On Thu, Jun 08, 2023 at 07:33:31AM -0700, Alexander Duyck wrote: > > On Wed, Jun 7, 2023 at 8:05=E2=80=AFPM Baolu Lu wrote: > > > > > > On 6/8/23 7:03 AM, Alexander Duyck wrote: > > > > On Wed, Jun 7, 2023 at 3:40=E2=80=AFPM Alexander Duyck > > > > wrote: > > > >> > > > >> I am running into a DMA issue that appears to be a conflict betwee= n > > > >> ACS and IOMMU. As per the documentation I can find, the IOMMU is > > > >> supposed to create reserved regions for MSI and the memory window > > > >> behind the root port. However looking at reserved_regions I am not > > > >> seeing that. I only see the reservation for the MSI. > > > >> > > > >> So for example with an enabled NIC and iommu enabled w/o passthru = I am seeing: > > > >> # cat /sys/bus/pci/devices/0000\:83\:00.0/iommu_group/reserved_reg= ions > > > >> 0x00000000fee00000 0x00000000feefffff msi > > > >> > > > >> Shouldn't there also be a memory window for the region behind the = root > > > >> port to prevent any possible peer-to-peer access? > > > > > > > > Since the iommu portion of the email bounced I figured I would fix > > > > that and provide some additional info. > > > > > > > > I added some instrumentation to the kernel to dump the resources fo= und > > > > in iova_reserve_pci_windows. From what I can tell it is finding the > > > > correct resources for the Memory and Prefetchable regions behind th= e > > > > root port. It seems to be calling reserve_iova which is successfull= y > > > > allocating an iova to reserve the region. > > > > > > > > However still no luck on why it isn't showing up in reserved_region= s. > > > > > > Perhaps I can ask the opposite question, why it should show up in > > > reserve_regions? Why does the iommu subsystem block any possible peer= - > > > to-peer DMA access? Isn't that a decision of the device driver. > > > > > > The iova_reserve_pci_windows() you've seen is for kernel DMA interfac= es > > > which is not related to peer-to-peer accesses. > > > > The problem is if the IOVA overlaps with the physical addresses of > > other devices that can be routed to via ACS redirect. As such if ACS > > redirect is enabled a host IOVA could be directed to another device on > > the switch instead. To prevent that we need to reserve those addresses > > to avoid address space collisions. Our test case is just to perform DMA to/from the host on one device on a switch and what we are seeing is that when we hit an IOVA that matches up with the physical address of the neighboring devices BAR0 then we are seeing an AER followed by a hot reset. > Any untranslated address from a device must be forwarded to the IOMMU whe= n > ACS is enabled correct?I guess if you want true p2p, then you would need > to map so that the hpa turns into the peer address.. but its always a rou= nd > trip to IOMMU. This assumes all parts are doing the Request Redirect "correctly". In our case there is a PCIe switch we are trying to debug and we have a few working theories. One concern I have is that the switch may be throwing an ACS violation for us using an address that matches a neighboring device instead of redirecting it to the upstream port. If we pull the switch and just run on the root complex the issue seems to be resolved so I started poking into the code which led me to the documentation pointing out what is supposed to be reserved based on the root complex and MSI regions. As a part of going down that rabbit hole I realized that the reserved_regions seems to only list the MSI reservation. However after digging a bit deeper it seems like there is code to reserve the memory behind the root complex in the IOVA but it doesn't look like that is visible anywhere and is the piece I am currently trying to sort out. What I am working on is trying to figure out if the system that is failing is actually reserving that memory region in the IOVA, or if that is somehow not happening in our test setup.