Received: by 2002:a05:6a10:a852:0:0:0:0 with SMTP id d18csp420289pxy; Wed, 5 May 2021 05:36:35 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwluE3nhGIWu/QvRnnSWHdVIeqwtFU7tT8yasvq+ZfEIaq4bLwi3MB84EJ5eHxq65pc/Bfg X-Received: by 2002:a65:6255:: with SMTP id q21mr27575378pgv.382.1620218195769; Wed, 05 May 2021 05:36:35 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1620218195; cv=none; d=google.com; s=arc-20160816; b=kXIKVa9s4qcolbq+i+Fs777OA/VzcrlO3dmD4EKFx9khZwnDarE4hHepsItTKfvh+V bJo699Gkp9n+P0VrN5cROeVnWUBRFyyIjOiRDSSonRJRZIfoh2KXIPgP2voHBt57Mvui lFJ1HxOJFihXnilXPTooPYc53sLf/G+VT/pGMOUesm3Edu2kLkF8ksqm/4yh395bKiCc wIWC+GF/y+Io8Bpge/JsFUxVCuKj7KX/rgKeeZliFy8KQ3stU8L3OcGPIzuYx6izQy8K srf9y0xAxKbWHnJSFDUzpxmR9F6luqHCd3drskyeG2en11ZIMctwMYDntpmbUNNPvjsa pdJg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=tHeZBvAp/53zLhc15BjoqFp+IYjtQSvKMWbWFqLjj+k=; b=gMPrOyCRGbEN/HusZQ5oFAUgfksRDNoJAFvdqd0p5DBHsdmSd99Qzhl1B2R9XlKd/p 5rUiQiTzy89oz2036xdBNo5GFEfJUAzXCpp+hhjWQ9aY/PKWJQA0oSJxLi0sFl/+84tu O1zOp8X5VrkmKY859nli3NeOJ7b6QQThj5atT44aa30sltNWlDTr1gUXOv+a7NlPca/2 wsg06GwLiHcbz2hClimoi9Kac8d8HqwzDa+cxt0FS782GQqk/l/BcL4+kVyLcVEo2Z9u 8by3Vy0iJVfe9M74so9BX5AZa0Lqg0/6cCyfd4zkiu8usgOJ+LRO3rcPhe3D3u8B4qF7 KoPg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=imVK2fy+; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id r7si8373639pjp.99.2021.05.05.05.36.22; Wed, 05 May 2021 05:36:35 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=imVK2fy+; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233421AbhEEMer (ORCPT + 99 others); Wed, 5 May 2021 08:34:47 -0400 Received: from mail.kernel.org ([198.145.29.99]:44220 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233420AbhEEMeq (ORCPT ); Wed, 5 May 2021 08:34:46 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id AEB49610FB; Wed, 5 May 2021 12:33:49 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1620218030; bh=cCwF3kjXeu9/OAGx9K6Ul4xnLXD3gXsFOVwrQwWioys=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=imVK2fy+GXicRxWqme4eKZXs4VaFMks9ckMmkv/CcIOINwxz5dnYYv6qqkTUwvn9f RGG5fih61VqDVssHTtfhbA00KnaQqBs+37AUITL7uNxlfk0NE2otpjA4yVMGrlWhLs 8ct0+cGGGIwLE0fd5QlQDWrVHVL3BtURt5kFAQoeVuZw17ILTD4bM4Z+wN+JIVlHQW TidH3gdISIB7UPDq8Xy8hNWeZYveC2UNvneq34vV/xXplUDLRdiEJk+i+tt908VVBw sXeflAPT10Tcv+OK5KLJTswahPvP+sExmSDfN+ZsGEMD/QgTy09hxr443ik8fd2nDo 0Pbvx6Mr8tr6Q== Received: by pali.im (Postfix) id 2898F79D; Wed, 5 May 2021 14:33:47 +0200 (CEST) Date: Wed, 5 May 2021 14:33:46 +0200 From: Pali =?utf-8?B?Um9ow6Fy?= To: Greg KH Cc: linux-usb@vger.kernel.org, linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org, Marek =?utf-8?B?QmVow7pu?= Subject: Re: xhci_pci & PCIe hotplug crash Message-ID: <20210505123346.kxfpumww5i4qmhnk@pali> References: <20210505120117.4wpmo6fhvzznf3wv@pali> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: NeoMutt/20180716 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wednesday 05 May 2021 14:09:17 Greg KH wrote: > On Wed, May 05, 2021 at 02:01:17PM +0200, Pali Rohár wrote: > > Hello! > > > > During debugging of pci-aardvark.c driver I got following synchronous > > external abort 96000210 which I can reproduce with VIA XHCI controller > > when PCIe hot plug support is enabled in kernel and PCIe Root Bridge > > triggers link down event via PCIe hot plug interrupt. > > > > [ 71.773033] pcieport 0000:00:00.0: pciehp: Slot(0): Link Down > > [ 71.779120] xhci_hcd 0000:01:00.0: remove, state 4 > > [ 71.784113] usb usb5: USB disconnect, device number 1 > > [ 71.790398] xhci_hcd 0000:01:00.0: USB bus 5 deregistered > > [ 72.511899] Internal error: synchronous external abort: 96000210 [#1] SMP > > [ 72.518918] Modules linked in: > > [ 72.522074] CPU: 1 PID: 988 Comm: irq/53-pciehp Not tainted 5.12.0-dirty #949 > > [ 72.536983] pstate: 60000085 (nZCv daIf -PAN -UAO -TCO BTYPE=--) > > [ 72.543182] pc : xhci_irq+0x70/0x17b8 > > [ 72.546972] lr : xhci_irq+0x28/0x17b8 > > [ 72.550752] sp : ffffffc012b8bab0 > > [ 72.554167] x29: ffffffc012b8bab0 x28: 00000000000000a0 > > [ 72.559652] x27: 0000000000000060 x26: ffffff8000af2250 > > [ 72.565135] x25: ffffffc0100b0d48 x24: ffffffc0100b0be0 > > [ 72.570620] x23: ffffff80003be028 x22: ffffff8000af229c > > [ 72.576104] x21: 0000000000000080 x20: ffffff8000af2000 > > [ 72.581587] x19: ffffff8000af2000 x18: 0000000000000004 > > [ 72.587071] x17: 0000000000000000 x16: 0000000000000000 > > [ 72.592553] x15: ffffffc01154cc70 x14: ffffff8001751df8 > > [ 72.598037] x13: 0000000000000000 x12: 0000000000000000 > > [ 72.603519] x11: ffffff8001751da8 x10: ffffffc01154cc78 > > [ 72.609001] x9 : ffffffc01087c238 x8 : 0000000000000000 > > [ 72.614485] x7 : ffffffc01162c4e0 x6 : 0000000000000000 > > [ 72.619967] x5 : fffffffe00085000 x4 : fffffffe00085000 > > [ 72.625451] x3 : 0000000000000000 x2 : 0000000000000001 > > [ 72.630933] x1 : ffffffc0118bd024 x0 : 0000000000000000 > > [ 72.636415] Call trace: > > [ 72.638936] xhci_irq+0x70/0x17b8 > > [ 72.642360] usb_hcd_irq+0x34/0x50 > > [ 72.645876] usb_hcd_pci_remove+0x78/0x138 > > [ 72.650106] xhci_pci_remove+0x6c/0xa8 > > [ 72.653978] pci_device_remove+0x44/0x108 > > [ 72.658122] device_release_driver_internal+0x110/0x1e0 > > [ 72.663521] device_release_driver+0x1c/0x28 > > [ 72.667931] pci_stop_bus_device+0x84/0xc0 > > [ 72.672162] pci_stop_and_remove_bus_device+0x1c/0x30 > > [ 72.677373] pciehp_unconfigure_device+0x98/0xf8 > > [ 72.682138] pciehp_disable_slot+0x60/0x118 > > [ 72.686457] pciehp_handle_presence_or_link_change+0xec/0x3b0 > > [ 72.692386] pciehp_ist+0x170/0x1a0 > > [ 72.695984] irq_thread_fn+0x30/0x90 > > [ 72.699674] irq_thread+0x13c/0x200 > > [ 72.703271] kthread+0x12c/0x130 > > [ 72.706603] ret_from_fork+0x10/0x1c > > [ 72.710299] Code: 35ffff83 35002741 f9400f41 91001021 (b9400021) > > [ 72.716586] ---[ end trace 20ce3e30ff292c93 ]--- > > [ 72.721453] genirq: exiting task "irq/53-pciehp" (988) is an active IRQ thread (irq 53) > > [ 72.730068] sched: RT throttling activated > > > > And after that kernel is in some semi-broken state. Some functionality > > works, but some other (like reboot) does not. > > > > I can reproduce it also when I manually inject/fake this link down PCIe > > hot plug interrupt with setting corresponding bits in PCIe Root Status > > registers, so pciehp driver thinks that link down even occurred. > > > > I suspect that issue is in usb_hcd_pci_remove() function which calls > > local_irq_disable()+usb_hcd_irq()+local_irq_enable() functions but do > > not take into care that whole usb_hcd_pci_remove() function may be > > called from interrupt context. > > usb_hcd_pci_remove() should NOT be called from interrupt context. > > What is causing that to happen? PCIe Hot Plug interrupt with PCI_EXP_SLTSTA_DLLSC status bit set. I can reproduce it by issuing PCIe Hot Reset to PCIe controller (via setpci from userspace) which resulted in link down event (which is obvious) and PCIe controller then triggered link down interrupt. > No PCI driver can handle that, especially USB ones. > > > Can you look at this issue if it is really safe to call usb_hcd_irq() > > from interrupt context? Or rather if it is safe to call functions like > > pciehp_disable_slot() or device_release_driver() from interrupt context > > like it can be seen in call trace? > > What is removing devices from an irq? It can be seen in above call trace. It is pciehp_disable_slot() followed by pciehp_unconfigure_device(). > That is wrong, pci hotplug never used to do that, what recently changed? I really do not know what was changed recently. I hope that other people in linux-pci ML would know history details better. I just spotted this crash during debugging PCIe controller driver pci-aardvark.c with trying to expose its link down events via "hot plug" interrupt and corresponding link layer state flags. And because in whole call trace I see only generic PCIe and USB code path without any driver specific parts, I suspect that this is not PCIe controller-specific issue but rather something "wrong" in genetic PCIe (or USB) code. That is why I sent this email, so maybe somebody else find something suspicious here. But still there is a chance that issue can be also in pci-aardvark.c driver and somehow it masked its issue and propagated it into generic PCIe hot plug code path. > thanks, > > greg k-h