Received: by 2002:a25:1506:0:0:0:0:0 with SMTP id 6csp2395930ybv; Sat, 8 Feb 2020 21:55:05 -0800 (PST) X-Google-Smtp-Source: APXvYqz1sJOraFif+27p6yv+CxzrPhv9LI7WGyl2K4iH+hr8EJMD0BuJaZXOy9nlbO5aNVKbBCw9 X-Received: by 2002:a9d:24c8:: with SMTP id z66mr5908922ota.52.1581227705120; Sat, 08 Feb 2020 21:55:05 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1581227705; cv=none; d=google.com; s=arc-20160816; b=DnYpMcPjau5U8VdYG8Dv/cZdbp9Xnf+5lWR4MlNGN2Nq2Mkbuz41s2+H7W/E/N/6wT 3uOvZ0n+/wwfM8H4xRuaqZx9QGIw5e2xsKj8C0iWIyDsos5yZk6YXnVeg1q6W0//WsVN 53/bR8vNrPB0Bn4Mci+zleuNthAtIUhRst+JKoXIkbulYksQlV/In86HNd70ayUwEHZM 9oYDHTY8ZVUIz5DNbCq45wHxEfs9XCR0pArHAsKi6EnWughPndSgIKORPQC/YUcUI+xP rxgoW/PVvxGEJw6/blSNozNxvhODncfCkqBr/UR3Tk+QpB3sAh3jbCp81OMT15xPJmhi JrGA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:to:subject; bh=0+4eUcxu2Ek5lTwC/rPUKnOlo8kEUnIL/yYpcUwEvKE=; b=psWS3CTJ7JTKCO/ZSFONg5mQV649WggHvUlcXEfHg+nof4SMkgeq0TN1D5IvHTR1Ho HFZMIoRTBcL0S/tSktup4p6/XwINMU7XZN45/ixloSuu6nvgIL/oj3pSETdkQ7LFtgZm nSEFIZW5O08PiQqh1Fqer2XvwSlnffXTZOipzAbb6VyEwpue/yf6FwYhHYOhFRezFOf1 OXUKLsIgvRvRRr921xB60DfAXoJclbr0Ig2LRsJ5ESo6aHwNsKRioIdYPPHM8KCfnce0 V6VU6slyQKpD649teMdbRCY++B1gUsRu1XC4dfRGpXT8zArTobDXWQwPHb+iKJr9dfvT yRDQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id e126si1701391oib.21.2020.02.08.21.54.53; Sat, 08 Feb 2020 21:55:05 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1725900AbgBIFyS (ORCPT + 99 others); Sun, 9 Feb 2020 00:54:18 -0500 Received: from mga12.intel.com ([192.55.52.136]:57498 "EHLO mga12.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725861AbgBIFyS (ORCPT ); Sun, 9 Feb 2020 00:54:18 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga106.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 08 Feb 2020 21:54:16 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.70,420,1574150400"; d="scan'208";a="346819222" Received: from yilinxu-mobl.ccr.corp.intel.com (HELO [10.254.213.207]) ([10.254.213.207]) by fmsmga001.fm.intel.com with ESMTP; 08 Feb 2020 21:54:15 -0800 Subject: Re: warning from domain_get_iommu To: iommu@lists.linux-foundation.org, linux-kernel@vger.kernel.org References: <20200204200714.u4ezhi6vhqhxog6e@cantor> <20200206174358.shzhieijle5wdshr@cantor> <20200207093413.oy4tclbrb3vqs3vz@cantor> <20200208101952.mrawika2wf4kwt3y@cantor> From: Lu Baolu Message-ID: <3543c012-4912-283f-8945-98d434c1e7a1@linux.intel.com> Date: Sun, 9 Feb 2020 13:54:14 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101 Thunderbird/68.4.2 MIME-Version: 1.0 In-Reply-To: <20200208101952.mrawika2wf4kwt3y@cantor> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, On 2020/2/8 18:19, Jerry Snitselaar wrote: > On Sat Feb 08 20, Lu Baolu wrote: >> Hi Jerry, >> >> On 2020/2/7 17:34, Jerry Snitselaar wrote: >>> On Thu Feb 06 20, Jerry Snitselaar wrote: >>>> On Tue Feb 04 20, Jerry Snitselaar wrote: >>>>> I'm working on getting a system to reproduce this, and verify it >>>>> also occurs >>>>> with 5.5, but I have a report of a case where the kdump kernel gives >>>>> warnings like the following on a hp dl360 gen9: >>>>> >>>>> [    2.830589] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) >>>>> Driver >>>>> [    2.832615] ehci-pci: EHCI PCI platform driver >>>>> [    2.834190] ehci-pci 0000:00:1a.0: EHCI Host Controller >>>>> [    2.835974] ehci-pci 0000:00:1a.0: new USB bus registered, >>>>> assigned bus number 1 >>>>> [    2.838276] ehci-pci 0000:00:1a.0: debug port 2 >>>>> [    2.839700] WARNING: CPU: 0 PID: 1 at >>>>> drivers/iommu/intel-iommu.c:598 domain_get_iommu+0x55/0x60 >>>>> [    2.840671] Modules linked in: >>>>> [    2.840671] CPU: 0 PID: 1 Comm: swapper/0 Not tainted >>>>> 4.18.0-170.el8.kdump2.x86_64 #1 >>>>> [    2.840671] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 >>>>> Gen9, BIOS P89 07/21/2019 >>>>> [    2.840671] RIP: 0010:domain_get_iommu+0x55/0x60 >>>>> [    2.840671] Code: c2 01 eb 0b 48 83 c0 01 8b 34 87 85 f6 75 0b >>>>> 48 63 c8 48 39 c2 75 ed 31 c0 c3 48 c1 e1 03 48 8b 05 70 f3 91 01 >>>>> 48 8b 04 08 c3 <0f> 0b 31 c0 c3 31 c9 eb eb 66 90 0f 1f 44 00 00 41 >>>>> 55 40 0f b6 f6 >>>>> [    2.840671] RSP: 0018:ffffc900000dfab8 EFLAGS: 00010202 >>>>> [    2.840671] RAX: ffff88ec7f1c8000 RBX: 0000006c7c867000 RCX: >>>>> 0000000000000000 >>>>> [    2.840671] RDX: 00000000fffffff0 RSI: 0000000000000000 RDI: >>>>> ffff88ec7f1c8000 >>>>> [    2.840671] RBP: ffff88ec6f7000b0 R08: ffff88ec7f19d000 R09: >>>>> ffff88ec7cbfcd00 >>>>> [    2.840671] R10: 0000000000000095 R11: ffffc900000df928 R12: >>>>> 0000000000000000 >>>>> [    2.840671] R13: ffff88ec7f1c8000 R14: 0000000000001000 R15: >>>>> 00000000ffffffff >>>>> [    2.840671] FS:  0000000000000000(0000) >>>>> GS:ffff88ec7f600000(0000) knlGS:0000000000000000 >>>>> [    2.840671] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>>>> [    2.840671] CR2: 00007ff3e1713000 CR3: 0000006c7de0a004 CR4: >>>>> 00000000001606b0 >>>>> [    2.840671] Call Trace: >>>>> [    2.840671]  __intel_map_single+0x62/0x140 >>>>> [    2.840671]  intel_alloc_coherent+0xa6/0x130 >>>>> [    2.840671]  dma_pool_alloc+0xd8/0x1e0 >>>>> [    2.840671]  e_qh_alloc+0x55/0x130 >>>>> [    2.840671]  ehci_setup+0x284/0x7b0 >>>>> [    2.840671]  ehci_pci_setup+0xa3/0x530 >>>>> [    2.840671]  usb_add_hcd+0x2b6/0x800 >>>>> [    2.840671]  usb_hcd_pci_probe+0x375/0x460 >>>>> [    2.840671]  local_pci_probe+0x41/0x90 >>>>> [    2.840671]  pci_device_probe+0x105/0x1b0 >>>>> [    2.840671]  driver_probe_device+0x12d/0x460 >>>>> [    2.840671]  device_driver_attach+0x50/0x60 >>>>> [    2.840671]  __driver_attach+0x61/0x130 >>>>> [    2.840671]  ? device_driver_attach+0x60/0x60 >>>>> [    2.840671]  bus_for_each_dev+0x77/0xc0 >>>>> [    2.840671]  ? klist_add_tail+0x3b/0x70 >>>>> [    2.840671]  bus_add_driver+0x14d/0x1e0 >>>>> [    2.840671]  ? ehci_hcd_init+0xaa/0xaa >>>>> [    2.840671]  ? do_early_param+0x91/0x91 >>>>> [    2.840671]  driver_register+0x6b/0xb0 >>>>> [    2.840671]  ? ehci_hcd_init+0xaa/0xaa >>>>> [    2.840671]  do_one_initcall+0x46/0x1c3 >>>>> [    2.840671]  ? do_early_param+0x91/0x91 >>>>> [    2.840671]  kernel_init_freeable+0x1af/0x258 >>>>> [    2.840671]  ? rest_init+0xaa/0xaa >>>>> [    2.840671]  kernel_init+0xa/0xf9 >>>>> [    2.840671]  ret_from_fork+0x35/0x40 >>>>> [    2.840671] ---[ end trace e87b0d9a1c8135c4 ]--- >>>>> [    3.010848] ehci-pci 0000:00:1a.0: Using iommu dma mapping >>>>> [    3.012551] ehci-pci 0000:00:1a.0: 32bit DMA uses non-identity >>>>> mapping >>>>> [    3.018537] ehci-pci 0000:00:1a.0: cache line size of 64 is not >>>>> supported >>>>> [    3.021188] ehci-pci 0000:00:1a.0: irq 18, io mem 0x93002000 >>>>> [    3.029006] ehci-pci 0000:00:1a.0: USB 2.0 started, EHCI 1.00 >>>>> [    3.030918] usb usb1: New USB device found, idVendor=1d6b, >>>>> idProduct=0002, bcdDevice= 4.18 >>>>> [    3.033491] usb usb1: New USB device strings: Mfr=3, Product=2, >>>>> SerialNumber=1 >>>>> [    3.035900] usb usb1: Product: EHCI Host Controller >>>>> [    3.037423] usb usb1: Manufacturer: Linux >>>>> 4.18.0-170.el8.kdump2.x86_64 ehci_hcd >>>>> [    3.039691] usb usb1: SerialNumber: 0000:00:1a.0 >>>>> >>>>> It looks like the device finishes initializing once it figures out it >>>>> needs dma mapping instead of the default >>>>> passthrough. intel_alloc_coherent calls iommu_need_mapping, before it >>>>> calls __intel_map_single, so I'm not sure why it is tripping over the >>>>> WARN_ON in domain_get_iommu. >>>>> >>>>> one thing I noticed while looking at this is that domain_get_iommu can >>>>> return NULL. So should there be something like the following in >>>>> __intel_map_single after the domain_get_iommu call? >>>>> >>>>> if (!iommu) >>>>>  goto error; >>>>> >>>>> It is possible to deref the null pointer later otherwise. >>>>> >>>>> Regards, >>>>> Jerry >>>> >>>> I reproduced the warning with a 5.5 kernel on an Intel NUC5i5MYBE. >>> >>> Hi Baolu, >>> >>> I think I understand what is happening here. With the kdump boot >>> translation is pre-enabled, so in intel_iommu_add_device things are >>> getting set to DEFER_DEVICE_DOMAIN_INFO. When intel_alloc_coherent >>> calls iommu_need_mapping it returns true, but doesn't do the dma >>> domain switch because of DEFER_DEVICE_DOMAIN_INFO. Then >>> __intel_map_single gets called and it calls deferred_attach_domain, >>> which sets the domain to the group domain, which in this case is the >>> identity domain. Then it calls domain_get_iommu, which spits out the >>> warning because the domain type was dma and returns null. My >>> workaround was to add a call to iommu_need_mapping and find_domain >>> after the deferred_attach_domain, but I don't know if that is the >>> correct solution. There are a couple other spots like intel_map_sg >>> that have the deferred_attach_domain after iommu_need_mapping that >>> possibly will suffer from the same problem. >>> >>> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c >>> index b5c5ab58d395..063f45323cfc 100644 >>> --- a/drivers/iommu/intel-iommu.c >>> +++ b/drivers/iommu/intel-iommu.c >>> @@ -3515,6 +3515,10 @@ static dma_addr_t __intel_map_single(struct >>> device *dev, phys_addr_t paddr, >>>         if (!domain) >>>                 return DMA_MAPPING_ERROR; >>> >>> +       if (!iommu_need_mapping(dev)) >>> +               return paddr; >>> + >>> +       domain = find_domain(dev); >>>         iommu = domain_get_iommu(domain); >>>         size = aligned_nrpages(paddr, size); >>> >>> >>> I finally got a git repo over to one of these systems, and was >>> able to reproduce the issue with the head of linus's tree. With commit >>> 9235cb13d7d1 ("iommu/vt-d: Allow devices with RMRRs to use identity >>> domain") >>> there are more of the warnings, because devices are using identity that >>> weren't before. >>> >> >> Is it possible to move deferred domain attachment to identity_mapping()? >> >> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c >> index 9dc37672bf89..234ab346198e 100644 >> --- a/drivers/iommu/intel-iommu.c >> +++ b/drivers/iommu/intel-iommu.c >> @@ -2913,13 +2913,11 @@ static int __init si_domain_init(int hw) >> >> static int identity_mapping(struct device *dev) >> { >> -       struct device_domain_info *info; >> +       struct dmar_domain *domain; >> >> -       info = dev->archdata.iommu; >> -       if (info && info != DUMMY_DEVICE_DOMAIN_INFO && info != >> DEFER_DEVICE_DOMAIN_INFO) >> -               return (info->domain == si_domain); >> +       domain = deferred_attach_domain(dev); >> >> -       return 0; >> +       return (!domain || domain_type_is_si(domain)); >> } >> >> static int domain_add_dev_info(struct dmar_domain *domain, struct >> device *dev) >> >> Best regards, >> baolu > > Hi Baolu, > > I think that would work, and then change the deferred_attach_domain > calls in __intel_map_single and intel_map_sg to find_domain? > Yes. > I did a quick test with it on the system where I've been looking at this. > Thanks! Best regards, baolu