Received: by 2002:a25:ab43:0:0:0:0:0 with SMTP id u61csp674582ybi; Fri, 21 Jun 2019 06:20:47 -0700 (PDT) X-Google-Smtp-Source: APXvYqyvmaVsRYppLW9FtWp1OxV3WC2EWgaeJd9RVpiem1AmB0ZR9q9DRei5Yj2KORQ6nurDAB4h X-Received: by 2002:a17:902:e6:: with SMTP id a93mr89020164pla.175.1561123247461; Fri, 21 Jun 2019 06:20:47 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1561123247; cv=none; d=google.com; s=arc-20160816; b=c25tYQ3ow6iJ8SDdsPhQbazrQ6QI/qAgHlK0D1+2DnmrWhotxo/pmUAOPG/VU3VVkr igpic6wUCcdsoey1aj1pIzf28ynJiZfjRirtge5wEpR4KgAzInpA0JbfdENrFSQsUWsS fPfIVJB3WK6xb7s1N9VzRtmr1pMuUXK14yHjATHgMGIAr1yOeMqzPK1QlQeRmFkH94nG oiWoVxG/w+QDbTFuNiPjopqYb777Gsyqc1JWrIX6aE4bsEiitQNO4mYbRQBavL/fITNc 7+dxeyu9AgOLq9+cJkE3KB64fu10Yk0bi0X9OEAuw2j5p2KkGgTdjvJqR2NccoqKFxFx iaQg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:date:subject:user-agent:message-id :references:cc:in-reply-to:from:to:content-transfer-encoding :mime-version; bh=Fbks/ymI3hClAPM5zjf4TP3WKyIua3mDoeelUhbhd6w=; b=PRA5HRQ95cdY/wcmgPmioXOPTrN75s8EHyYh0s+2aorJuk3LAYPFbzIdMgE/nEAid9 Rx7V3UkDlbhDu3PSywtvdTBHdPDjR4tAi1P+jDId8Pc8weqFp3xeFmMWRHRStjNa4yqL Fi8w/BWYHvnXl4IFQqJBQyd7akpB01LjN+bJM1sLh0SAW+QrsOS2VAc5WKKcGJiC3Ba5 69wJryWXlOzk1ikufr930B//2Wd0J6k13IYWGhywkukHz3v4XJ6Bpnow9D7LwOHyUz/7 5lHepWpxXFkPKJfb6m1QA3tmMaM84nFKoyssSWAS0rYtDmNvodbG3p2ZXW7IYrjA/V51 5bEQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id b5si2568852pls.83.2019.06.21.06.20.31; Fri, 21 Jun 2019 06:20:47 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726070AbfFUNUW convert rfc822-to-8bit (ORCPT + 99 others); Fri, 21 Jun 2019 09:20:22 -0400 Received: from mail.fireflyinternet.com ([109.228.58.192]:51996 "EHLO fireflyinternet.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726002AbfFUNUW (ORCPT ); Fri, 21 Jun 2019 09:20:22 -0400 X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from localhost (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP (TLS) id 16979502-1500050 for multiple; Fri, 21 Jun 2019 14:19:12 +0100 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8BIT To: Peter Xu , iommu@lists.linux-foundation.org, linux-kernel@vger.kernel.org From: Chris Wilson In-Reply-To: <20190621023205.12936-1-peterx@redhat.com> Cc: joro@8bytes.org, peterx@redhat.com, Lu Baolu , dave.jiang@intel.com References: <20190621023205.12936-1-peterx@redhat.com> Message-ID: <156112315020.2401.16873297513079645766@skylake-alporthouse-com> User-Agent: alot/0.6 Subject: Re: [PATCH] Revert "iommu/vt-d: Fix lock inversion between iommu->lock and device_domain_lock" Date: Fri, 21 Jun 2019 14:19:10 +0100 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Quoting Peter Xu (2019-06-21 03:32:05) > This reverts commit 7560cc3ca7d9d11555f80c830544e463fcdb28b8. > > With 5.2.0-rc5 I can easily trigger this with lockdep and iommu=pt: > > ====================================================== > WARNING: possible circular locking dependency detected > 5.2.0-rc5 #78 Not tainted > ------------------------------------------------------ > swapper/0/1 is trying to acquire lock: > 00000000ea2b3beb (&(&iommu->lock)->rlock){+.+.}, at: domain_context_mapping_one+0xa5/0x4e0 > but task is already holding lock: > 00000000a681907b (device_domain_lock){....}, at: domain_context_mapping_one+0x8d/0x4e0 > which lock already depends on the new lock. > the existing dependency chain (in reverse order) is: > -> #1 (device_domain_lock){....}: > _raw_spin_lock_irqsave+0x3c/0x50 > dmar_insert_one_dev_info+0xbb/0x510 > domain_add_dev_info+0x50/0x90 > dev_prepare_static_identity_mapping+0x30/0x68 > intel_iommu_init+0xddd/0x1422 > pci_iommu_init+0x16/0x3f > do_one_initcall+0x5d/0x2b4 > kernel_init_freeable+0x218/0x2c1 > kernel_init+0xa/0x100 > ret_from_fork+0x3a/0x50 > -> #0 (&(&iommu->lock)->rlock){+.+.}: > lock_acquire+0x9e/0x170 > _raw_spin_lock+0x25/0x30 > domain_context_mapping_one+0xa5/0x4e0 > pci_for_each_dma_alias+0x30/0x140 > dmar_insert_one_dev_info+0x3b2/0x510 > domain_add_dev_info+0x50/0x90 > dev_prepare_static_identity_mapping+0x30/0x68 > intel_iommu_init+0xddd/0x1422 > pci_iommu_init+0x16/0x3f > do_one_initcall+0x5d/0x2b4 > kernel_init_freeable+0x218/0x2c1 > kernel_init+0xa/0x100 > ret_from_fork+0x3a/0x50 > > other info that might help us debug this: > Possible unsafe locking scenario: > CPU0 CPU1 > ---- ---- > lock(device_domain_lock); > lock(&(&iommu->lock)->rlock); > lock(device_domain_lock); > lock(&(&iommu->lock)->rlock); > > *** DEADLOCK *** > 2 locks held by swapper/0/1: > #0: 00000000033eb13d (dmar_global_lock){++++}, at: intel_iommu_init+0x1e0/0x1422 > #1: 00000000a681907b (device_domain_lock){....}, at: domain_context_mapping_one+0x8d/0x4e0 > > stack backtrace: > CPU: 2 PID: 1 Comm: swapper/0 Not tainted 5.2.0-rc5 #78 > Hardware name: LENOVO 20KGS35G01/20KGS35G01, BIOS N23ET50W (1.25 ) 06/25/2018 > Call Trace: > dump_stack+0x85/0xc0 > print_circular_bug.cold.57+0x15c/0x195 > __lock_acquire+0x152a/0x1710 > lock_acquire+0x9e/0x170 > ? domain_context_mapping_one+0xa5/0x4e0 > _raw_spin_lock+0x25/0x30 > ? domain_context_mapping_one+0xa5/0x4e0 > domain_context_mapping_one+0xa5/0x4e0 > ? domain_context_mapping_one+0x4e0/0x4e0 > pci_for_each_dma_alias+0x30/0x140 > dmar_insert_one_dev_info+0x3b2/0x510 > domain_add_dev_info+0x50/0x90 > dev_prepare_static_identity_mapping+0x30/0x68 > intel_iommu_init+0xddd/0x1422 > ? printk+0x58/0x6f > ? lockdep_hardirqs_on+0xf0/0x180 > ? do_early_param+0x8e/0x8e > ? e820__memblock_setup+0x63/0x63 > pci_iommu_init+0x16/0x3f > do_one_initcall+0x5d/0x2b4 > ? do_early_param+0x8e/0x8e > ? rcu_read_lock_sched_held+0x55/0x60 > ? do_early_param+0x8e/0x8e > kernel_init_freeable+0x218/0x2c1 > ? rest_init+0x230/0x230 > kernel_init+0xa/0x100 > ret_from_fork+0x3a/0x50 > > domain_context_mapping_one() is taking device_domain_lock first then > iommu lock, while dmar_insert_one_dev_info() is doing the reverse. > > That should be introduced by commit: > > 7560cc3ca7d9 ("iommu/vt-d: Fix lock inversion between iommu->lock and > device_domain_lock", 2019-05-27) > > So far I still cannot figure out how the previous deadlock was > triggered (I cannot find iommu lock taken before calling of > iommu_flush_dev_iotlb()), however I'm pretty sure that that change > should be incomplete at least because it does not fix all the places > so we're still taking the locks in different orders, while reverting > that commit is very clean to me so far that we should always take > device_domain_lock first then the iommu lock. > > We can continue to try to find the real culprit mentioned in > 7560cc3ca7d9, but for now I think we should revert it to fix current > breakage. > > CC: Joerg Roedel > CC: Lu Baolu > CC: dave.jiang@intel.com > Signed-off-by: Peter Xu I've run this through our CI which was also reporting the inversion, so Tested-by: Chris Wilson -Chris