Received: by 2002:ab2:6857:0:b0:1ef:ffd0:ce49 with SMTP id l23csp2564829lqp; Mon, 25 Mar 2024 02:47:21 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCW2CyLWasjcZSTFJiqCtZUZvuOAj9NHuw+H09BxBmgKcdPk/EdG7Cp1gXnR21aNWyoMD0o6gS0WXtIo5oRzrLBYJQKwu7UnEc3hTgWJxA== X-Google-Smtp-Source: AGHT+IE3l6/lk78mNe54nPI/wCxyICHbyZQdXKRewHKZmqwHAKZoHRzAvMuX+IdnZzxfqeTDKRXX X-Received: by 2002:a05:622a:4d0:b0:431:2813:8075 with SMTP id q16-20020a05622a04d000b0043128138075mr7869506qtx.66.1711360041675; Mon, 25 Mar 2024 02:47:21 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1711360041; cv=pass; d=google.com; s=arc-20160816; b=egiBLFwBCLEjhcKVsvmTwqZxiZ/0wv4gI6jeBqJQPu/wxnEZglzH/Da0Drdy0RWtOM /xuTq6XhunMO9zMzbPfvO1mfssWZMxdtmHSn/ycCnMd6mmxnM2opdoIwKdY4tEdlbmoJ iAxh4PicajnZj0b2yTzcQqR3zPvO2qrF00II6CuX9OvtWwWzVQ2S+MasAQqoBjhVZDyU 6xl1v5MrOcp/0/mK7tXr4sTTUlYxuJZCj1l+kvwdA40hqNBJsNETOhDU8xZGoiXlkgp5 8x0OUnVjkfKGArkPTkvEAwpNxmOL1G3WKdruEKYZYoNKTc4U7q+I3TgbY3i6uRTMDm36 92Bg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature; bh=gGazaOsEvZnOfwhzd6AsBVOVG99mhB/+GzHLzepM5/w=; fh=LtDv/SBeEWnWK/66gnk40orU6FDARRvvrcTexcudV48=; b=HwDk9o9szMgBuRiTbxNRcKD8IaVKIYjWPAjXRLSmafxVb9zWelhzZUIeRf8SzcRG+R zuGfoiUW6g9DWlK5CxwRUlZ9HxEKFXEmFJqNGO2zKB4TLyAuGbrW8mVbsANiDqelZaXy +38XVLoIpYa3e9m3ZbnkE6gutzdUgsEBJuBtfXyJ8iimvuoBmwRNFCr0B/rCMWVS5vSz WtAAgnpZL8eHhW3VfUgMuIAT7vTJ0+TV+mactzprtNkvm5Yi30W4+3ydakDWfO4ArrlY 5AItbP1LpGnDvAqxgB54CFYPjDU4bKcd7CAAYWdWhib4Eh8XaFTg9oBEsHDn835yLc95 M8hQ==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=YWzd4fNZ; arc=pass (i=1 dkim=pass dkdomain=kernel.org); spf=pass (google.com: domain of linux-kernel+bounces-115923-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-115923-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [2604:1380:45d1:ec00::1]) by mx.google.com with ESMTPS id z7-20020a05622a124700b00430d8ca526dsi5226712qtx.762.2024.03.25.02.47.21 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 25 Mar 2024 02:47:21 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-115923-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) client-ip=2604:1380:45d1:ec00::1; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=YWzd4fNZ; arc=pass (i=1 dkim=pass dkdomain=kernel.org); spf=pass (google.com: domain of linux-kernel+bounces-115923-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-115923-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id 59D401C30AB3 for ; Mon, 25 Mar 2024 09:47:21 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 0DE4E38BD0C; Mon, 25 Mar 2024 03:03:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="YWzd4fNZ" Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AD51D14D2A2; Sun, 24 Mar 2024 23:15:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711322118; cv=none; b=qqkUWRt2Sd1jdkA1r7Tc2qWthv7KOIBsU0Y1GFFu3hPW39NPJhvd2lHKeuaaUPbZaG5Ch+9enN2q3BVT1gf0xYpdOZlTAGTJlKJYVFmCaTBXqBqxFLOKeE83qiGoZYDX1xXeDaQYuHYoU/luC/0aDDeS+YGjw1acWJV094tL4mU= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711322118; c=relaxed/simple; bh=7Bcu9tfR+ZVwOZAKGOTpB0rnsxr0uOMF4ML8QEPZ6wM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=hekOv6+74AHLVBKz9ocQxI5+ZxClssTleb+kf8HuLJElM2vkEKGEAibA3lR+UPDPi00JnvDuqWZu1fyrj+Dfxp8/K86KBtA4+dzs5qg2hHRbm1tDKqEPwGG5YFGF0rNjCUrHWKm/FdZ/EmB2klWSj16t6oF777JaUyQlzco8iGo= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=YWzd4fNZ; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id AA355C43394; Sun, 24 Mar 2024 23:15:16 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1711322117; bh=7Bcu9tfR+ZVwOZAKGOTpB0rnsxr0uOMF4ML8QEPZ6wM=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=YWzd4fNZiZNKexEhuThgSQcupW8op5OejPLRJH5zVmz+KI6vXbAysfJvdkD1u8Vtz ljzURNCkgpWVhjn2X9drhL7F6cb7zKcYG4NTPOGJttrGp/sBTjl31bhA5QojJj579h m6eLIXOrYrk18suLpTx6YJLQtIS8/ibfbvbG4nAwYPIYrmlubJhOzPl8zpw78/8UCW GdM6xUL8t8OdYKV0YtShqU9XjAFP3cteQiVxaWyhstZeE3q87TOra0Kg2W6QXdBGPO FQZxNm5W0irY4e3l1qEFQwzi/5I1WKUSc6MhpI/yGqkjucN5LPOx+EqCKoGz8EJpqd hPfuylp197KJw== From: Sasha Levin To: linux-kernel@vger.kernel.org, stable@vger.kernel.org Cc: Ethan Zhao , Dan Carpenter , Haorong Ye , Lu Baolu , Joerg Roedel , Sasha Levin Subject: [PATCH 6.1 192/451] iommu/vt-d: Don't issue ATS Invalidation request when device is disconnected Date: Sun, 24 Mar 2024 19:07:48 -0400 Message-ID: <20240324231207.1351418-193-sashal@kernel.org> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240324231207.1351418-1-sashal@kernel.org> References: <20240324231207.1351418-1-sashal@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-stable: review X-Patchwork-Hint: Ignore Content-Transfer-Encoding: 8bit From: Ethan Zhao [ Upstream commit 4fc82cd907ac075648789cc3a00877778aa1838b ] For those endpoint devices connect to system via hotplug capable ports, users could request a hot reset to the device by flapping device's link through setting the slot's link control register, as pciehp_ist() DLLSC interrupt sequence response, pciehp will unload the device driver and then power it off. thus cause an IOMMU device-TLB invalidation (Intel VT-d spec, or ATS Invalidation in PCIe spec r6.1) request for non-existence target device to be sent and deadly loop to retry that request after ITE fault triggered in interrupt context. That would cause following continuous hard lockup warning and system hang [ 4211.433662] pcieport 0000:17:01.0: pciehp: Slot(108): Link Down [ 4211.433664] pcieport 0000:17:01.0: pciehp: Slot(108): Card not present [ 4223.822591] NMI watchdog: Watchdog detected hard LOCKUP on cpu 144 [ 4223.822622] CPU: 144 PID: 1422 Comm: irq/57-pciehp Kdump: loaded Tainted: G S OE kernel version xxxx [ 4223.822623] Hardware name: vendorname xxxx 666-106, BIOS 01.01.02.03.01 05/15/2023 [ 4223.822623] RIP: 0010:qi_submit_sync+0x2c0/0x490 [ 4223.822624] Code: 48 be 00 00 00 00 00 08 00 00 49 85 74 24 20 0f 95 c1 48 8b 57 10 83 c1 04 83 3c 1a 03 0f 84 a2 01 00 00 49 8b 04 24 8b 70 34 <40> f6 c6 1 0 74 17 49 8b 04 24 8b 80 80 00 00 00 89 c2 d3 fa 41 39 [ 4223.822624] RSP: 0018:ffffc4f074f0bbb8 EFLAGS: 00000093 [ 4223.822625] RAX: ffffc4f040059000 RBX: 0000000000000014 RCX: 0000000000000005 [ 4223.822625] RDX: ffff9f3841315800 RSI: 0000000000000000 RDI: ffff9f38401a8340 [ 4223.822625] RBP: ffff9f38401a8340 R08: ffffc4f074f0bc00 R09: 0000000000000000 [ 4223.822626] R10: 0000000000000010 R11: 0000000000000018 R12: ffff9f384005e200 [ 4223.822626] R13: 0000000000000004 R14: 0000000000000046 R15: 0000000000000004 [ 4223.822626] FS: 0000000000000000(0000) GS:ffffa237ae400000(0000) knlGS:0000000000000000 [ 4223.822627] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 4223.822627] CR2: 00007ffe86515d80 CR3: 000002fd3000a001 CR4: 0000000000770ee0 [ 4223.822627] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 4223.822628] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400 [ 4223.822628] PKRU: 55555554 [ 4223.822628] Call Trace: [ 4223.822628] qi_flush_dev_iotlb+0xb1/0xd0 [ 4223.822628] __dmar_remove_one_dev_info+0x224/0x250 [ 4223.822629] dmar_remove_one_dev_info+0x3e/0x50 [ 4223.822629] intel_iommu_release_device+0x1f/0x30 [ 4223.822629] iommu_release_device+0x33/0x60 [ 4223.822629] iommu_bus_notifier+0x7f/0x90 [ 4223.822630] blocking_notifier_call_chain+0x60/0x90 [ 4223.822630] device_del+0x2e5/0x420 [ 4223.822630] pci_remove_bus_device+0x70/0x110 [ 4223.822630] pciehp_unconfigure_device+0x7c/0x130 [ 4223.822631] pciehp_disable_slot+0x6b/0x100 [ 4223.822631] pciehp_handle_presence_or_link_change+0xd8/0x320 [ 4223.822631] pciehp_ist+0x176/0x180 [ 4223.822631] ? irq_finalize_oneshot.part.50+0x110/0x110 [ 4223.822632] irq_thread_fn+0x19/0x50 [ 4223.822632] irq_thread+0x104/0x190 [ 4223.822632] ? irq_forced_thread_fn+0x90/0x90 [ 4223.822632] ? irq_thread_check_affinity+0xe0/0xe0 [ 4223.822633] kthread+0x114/0x130 [ 4223.822633] ? __kthread_cancel_work+0x40/0x40 [ 4223.822633] ret_from_fork+0x1f/0x30 [ 4223.822633] Kernel panic - not syncing: Hard LOCKUP [ 4223.822634] CPU: 144 PID: 1422 Comm: irq/57-pciehp Kdump: loaded Tainted: G S OE kernel version xxxx [ 4223.822634] Hardware name: vendorname xxxx 666-106, BIOS 01.01.02.03.01 05/15/2023 [ 4223.822634] Call Trace: [ 4223.822634] [ 4223.822635] dump_stack+0x6d/0x88 [ 4223.822635] panic+0x101/0x2d0 [ 4223.822635] ? ret_from_fork+0x11/0x30 [ 4223.822635] nmi_panic.cold.14+0xc/0xc [ 4223.822636] watchdog_overflow_callback.cold.8+0x6d/0x81 [ 4223.822636] __perf_event_overflow+0x4f/0xf0 [ 4223.822636] handle_pmi_common+0x1ef/0x290 [ 4223.822636] ? __set_pte_vaddr+0x28/0x40 [ 4223.822637] ? flush_tlb_one_kernel+0xa/0x20 [ 4223.822637] ? __native_set_fixmap+0x24/0x30 [ 4223.822637] ? ghes_copy_tofrom_phys+0x70/0x100 [ 4223.822637] ? __ghes_peek_estatus.isra.16+0x49/0xa0 [ 4223.822637] intel_pmu_handle_irq+0xba/0x2b0 [ 4223.822638] perf_event_nmi_handler+0x24/0x40 [ 4223.822638] nmi_handle+0x4d/0xf0 [ 4223.822638] default_do_nmi+0x49/0x100 [ 4223.822638] exc_nmi+0x134/0x180 [ 4223.822639] end_repeat_nmi+0x16/0x67 [ 4223.822639] RIP: 0010:qi_submit_sync+0x2c0/0x490 [ 4223.822639] Code: 48 be 00 00 00 00 00 08 00 00 49 85 74 24 20 0f 95 c1 48 8b 57 10 83 c1 04 83 3c 1a 03 0f 84 a2 01 00 00 49 8b 04 24 8b 70 34 <40> f6 c6 10 74 17 49 8b 04 24 8b 80 80 00 00 00 89 c2 d3 fa 41 39 [ 4223.822640] RSP: 0018:ffffc4f074f0bbb8 EFLAGS: 00000093 [ 4223.822640] RAX: ffffc4f040059000 RBX: 0000000000000014 RCX: 0000000000000005 [ 4223.822640] RDX: ffff9f3841315800 RSI: 0000000000000000 RDI: ffff9f38401a8340 [ 4223.822641] RBP: ffff9f38401a8340 R08: ffffc4f074f0bc00 R09: 0000000000000000 [ 4223.822641] R10: 0000000000000010 R11: 0000000000000018 R12: ffff9f384005e200 [ 4223.822641] R13: 0000000000000004 R14: 0000000000000046 R15: 0000000000000004 [ 4223.822641] ? qi_submit_sync+0x2c0/0x490 [ 4223.822642] ? qi_submit_sync+0x2c0/0x490 [ 4223.822642] [ 4223.822642] qi_flush_dev_iotlb+0xb1/0xd0 [ 4223.822642] __dmar_remove_one_dev_info+0x224/0x250 [ 4223.822643] dmar_remove_one_dev_info+0x3e/0x50 [ 4223.822643] intel_iommu_release_device+0x1f/0x30 [ 4223.822643] iommu_release_device+0x33/0x60 [ 4223.822643] iommu_bus_notifier+0x7f/0x90 [ 4223.822644] blocking_notifier_call_chain+0x60/0x90 [ 4223.822644] device_del+0x2e5/0x420 [ 4223.822644] pci_remove_bus_device+0x70/0x110 [ 4223.822644] pciehp_unconfigure_device+0x7c/0x130 [ 4223.822644] pciehp_disable_slot+0x6b/0x100 [ 4223.822645] pciehp_handle_presence_or_link_change+0xd8/0x320 [ 4223.822645] pciehp_ist+0x176/0x180 [ 4223.822645] ? irq_finalize_oneshot.part.50+0x110/0x110 [ 4223.822645] irq_thread_fn+0x19/0x50 [ 4223.822646] irq_thread+0x104/0x190 [ 4223.822646] ? irq_forced_thread_fn+0x90/0x90 [ 4223.822646] ? irq_thread_check_affinity+0xe0/0xe0 [ 4223.822646] kthread+0x114/0x130 [ 4223.822647] ? __kthread_cancel_work+0x40/0x40 [ 4223.822647] ret_from_fork+0x1f/0x30 [ 4223.822647] Kernel Offset: 0x6400000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) Such issue could be triggered by all kinds of regular surprise removal hotplug operation. like: 1. pull EP(endpoint device) out directly. 2. turn off EP's power. 3. bring the link down. etc. this patch aims to work for regular safe removal and surprise removal unplug. these hot unplug handling process could be optimized for fix the ATS Invalidation hang issue by calling pci_dev_is_disconnected() in function devtlb_invalidation_with_pasid() to check target device state to avoid sending meaningless ATS Invalidation request to iommu when device is gone. (see IMPLEMENTATION NOTE in PCIe spec r6.1 section 10.3.1) For safe removal, device wouldn't be removed until the whole software handling process is done, it wouldn't trigger the hard lock up issue caused by too long ATS Invalidation timeout wait. In safe removal path, device state isn't set to pci_channel_io_perm_failure in pciehp_unconfigure_device() by checking 'presence' parameter, calling pci_dev_is_disconnected() in devtlb_invalidation_with_pasid() will return false there, wouldn't break the function. For surprise removal, device state is set to pci_channel_io_perm_failure in pciehp_unconfigure_device(), means device is already gone (disconnected) call pci_dev_is_disconnected() in devtlb_invalidation_with_pasid() will return true to break the function not to send ATS Invalidation request to the disconnected device blindly, thus avoid to trigger further ITE fault, and ITE fault will block all invalidation request to be handled. furthermore retry the timeout request could trigger hard lockup. safe removal (present) & surprise removal (not present) pciehp_ist() pciehp_handle_presence_or_link_change() pciehp_disable_slot() remove_board() pciehp_unconfigure_device(presence) { if (!presence) pci_walk_bus(parent, pci_dev_set_disconnected, NULL); } this patch works for regular safe removal and surprise removal of ATS capable endpoint on PCIe switch downstream ports. Fixes: 6f7db75e1c46 ("iommu/vt-d: Add second level page table interface") Reviewed-by: Dan Carpenter Tested-by: Haorong Ye Signed-off-by: Ethan Zhao Link: https://lore.kernel.org/r/20240301080727.3529832-3-haifeng.zhao@linux.intel.com Signed-off-by: Lu Baolu Signed-off-by: Joerg Roedel Signed-off-by: Sasha Levin --- drivers/iommu/intel/pasid.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c index 3f03039e5cce5..32432d82d7744 100644 --- a/drivers/iommu/intel/pasid.c +++ b/drivers/iommu/intel/pasid.c @@ -435,6 +435,9 @@ devtlb_invalidation_with_pasid(struct intel_iommu *iommu, if (!info || !info->ats_enabled) return; + if (pci_dev_is_disconnected(to_pci_dev(dev))) + return; + sid = info->bus << 8 | info->devfn; qdep = info->ats_qdep; pfsid = info->pfsid; -- 2.43.0