Received: by 2002:a05:7412:8d10:b0:f3:1519:9f41 with SMTP id bj16csp5241343rdb; Wed, 13 Dec 2023 03:14:56 -0800 (PST) X-Google-Smtp-Source: AGHT+IHllKmfyXqhei7y47vpkV9G0cgTgbugzvuJW8QZrqmUcXfzH1UKPnn7trufBLt3UanXTuhF X-Received: by 2002:a17:903:24e:b0:1d0:6ffd:e2e8 with SMTP id j14-20020a170903024e00b001d06ffde2e8mr8991456plh.130.1702466096552; Wed, 13 Dec 2023 03:14:56 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1702466096; cv=none; d=google.com; s=arc-20160816; b=Wc8hkUsMPdcudQD5s8eZCxjY+1Gr3x+Vyy4uXfqLZljTNZGY1JByz3VjREFBkdzXxh dfk7Xaxkr/71LJrv1R8ecAB9ron02DGUpU3KrJeU70q3/JXKpaRgmL+yF28z4LHlyeIr BvsZJk6yRBI5P+DoMA6pZ2fzjZEB3vdanNq57Y4wiy+0FDQrrfb2MbaaibEBdjNFfM9f exCw4ihmKV2lOG42y4Tvdf1LdQwlfo25b/LojZ5f3EK6dLfAVQvtCcx1p8+ykTT31FK7 NMDVIXFSInBF/eT72FRg6c0QFgv1c8dZC1uO7W5BxG33JlqG6d9t2RE8CWAEGtoNCIW7 dxuQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=AR323lSvOq0rVXfikInFQCmuQZQxzOQHl6CV3whcudQ=; fh=kC0WQDdunCbOGYYYpR7FYhmhya0QP8gghf/EUCh78yo=; b=iwz3wXauX+MLMHkxiRKp21j7czbQvWiE8tW9pWCGdbWlBZzSoX1RFxgVQfrTElf360 TGRfdocJuficdz7joMKOSRpxD03Jv3fRJGw7aRVrVDala98aQ7bWsMteJJ98ZVH55Gp7 EWLqrsbYtUD45WP6BEBlUFBQDgt09UWdKdL/tZVQo+1gkaU3RKDrwx2+a+Xsw4poQ6Ph 2SCiA4znl3Zeo97Gz1mLOZ1NJ8lEO/k4CRfkH9yPL8kN0INqHrHveVHsMDn33m3F7QWs h3B3d50F840McgtIHV5RVer8M6ljHgo0NU/LXVcNcb/sYhQgvuStoaYsjZCvWpPQXZ/S XesQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.31 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from morse.vger.email (morse.vger.email. [23.128.96.31]) by mx.google.com with ESMTPS id c17-20020a170902d49100b001d054a8f128si9637228plg.451.2023.12.13.03.14.56 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 13 Dec 2023 03:14:56 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.31 as permitted sender) client-ip=23.128.96.31; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.31 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by morse.vger.email (Postfix) with ESMTP id 272EA803D835; Wed, 13 Dec 2023 03:14:54 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at morse.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233364AbjLMLOk (ORCPT + 99 others); Wed, 13 Dec 2023 06:14:40 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39514 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230353AbjLMKoO (ORCPT ); Wed, 13 Dec 2023 05:44:14 -0500 Received: from bmailout2.hostsharing.net (bmailout2.hostsharing.net [83.223.78.240]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A6A18112; Wed, 13 Dec 2023 02:44:19 -0800 (PST) Received: from h08.hostsharing.net (h08.hostsharing.net [83.223.95.28]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "*.hostsharing.net", Issuer "RapidSSL TLS RSA CA G1" (verified OK)) by bmailout2.hostsharing.net (Postfix) with ESMTPS id 77D1D28045DDC; Wed, 13 Dec 2023 11:44:17 +0100 (CET) Received: by h08.hostsharing.net (Postfix, from userid 100393) id 6B25515AB43; Wed, 13 Dec 2023 11:44:17 +0100 (CET) Date: Wed, 13 Dec 2023 11:44:17 +0100 From: Lukas Wunner To: Ethan Zhao Cc: bhelgaas@google.com, baolu.lu@linux.intel.com, dwmw2@infradead.org, will@kernel.org, robin.murphy@arm.com, linux-pci@vger.kernel.org, iommu@lists.linux.dev, linux-kernel@vger.kernel.org, Haorong Ye Subject: Re: [PATCH 2/2] iommu/vt-d: don's issue devTLB flush request when device is disconnected Message-ID: <20231213104417.GA31964@wunner.de> References: <20231213034637.2603013-1-haifeng.zhao@linux.intel.com> <20231213034637.2603013-3-haifeng.zhao@linux.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20231213034637.2603013-3-haifeng.zhao@linux.intel.com> User-Agent: Mutt/1.10.1 (2018-07-13) X-Spam-Status: No, score=-0.8 required=5.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on morse.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (morse.vger.email [0.0.0.0]); Wed, 13 Dec 2023 03:14:54 -0800 (PST) On Tue, Dec 12, 2023 at 10:46:37PM -0500, Ethan Zhao wrote: > For those endpoint devices connect to system via hotplug capable ports, > users could request a warm reset to the device by flapping device's link > through setting the slot's link control register, Well, users could just *unplug* the device, right? Why is it relevant that thay could fiddle with registers in config space? > as pciehpt_ist() DLLSC > interrupt sequence response, pciehp will unload the device driver and > then power it off. thus cause an IOMMU devTLB flush request for device to > be sent and a long time completion/timeout waiting in interrupt context. A completion timeout should be on the order of usecs or msecs, why does it cause a hard lockup? The dmesg excerpt you've provided shows a 12 *second* delay between hot removal and watchdog reaction. > Fix it by checking the device's error_state in > devtlb_invalidation_with_pasid() to avoid sending meaningless devTLB flush > request to link down device that is set to pci_channel_io_perm_failure and > then powered off in This doesn't seem to be a proper fix. It will work most of the time but not always. A user might bring down the slot via sysfs, then yank the card from the slot just when the iommu flush occurs such that the pci_dev_is_disconnected(pdev) check returns false but the card is physically gone immediately afterwards. In other words, you've shrunk the time window during which the issue may occur, but haven't eliminated it completely. Thanks, Lukas