Received: by 2002:ab2:6309:0:b0:1fb:d597:ff75 with SMTP id s9csp218619lqt; Thu, 6 Jun 2024 01:11:41 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCU9C3SU4OxOczGRC7absAoF7J0SHk4gF9bHJoGwaf9e2jBiUiD2g2cT7JnCk7Y1A3VeFPdQ1cAv6kZbyKo4N5D7TjRnYjBTIA9s8oYthA== X-Google-Smtp-Source: AGHT+IGC/8aEPzYE7f6dfKQMB+ooXViiWH+O0oANLl5mIoCTUfS26inQcEd8rg8mChhp2ywJBMCk X-Received: by 2002:a05:6358:12a6:b0:186:ea4:b1a3 with SMTP id e5c5f4694b2df-19c6ca13cd1mr483521555d.32.1717661501454; Thu, 06 Jun 2024 01:11:41 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1717661501; cv=pass; d=google.com; s=arc-20160816; b=H1qsfLmRtbusSjH6kYxHrVsY8sAljz/AnDBTufjacZzlqKFIPTiroZ4iH4ICDAadTI 1xDjxwsIyfqKlmnq+TBsNhoKIeTZFAECjkTX6WSlbm2nyhSEYIj+Kzm/+R80AwuGQ9xp rxnYlV9sgrFF78/gbpDzasRkxCg12omTr27juwVDZIoc1C3n35PDk/ZXWDtaTh0LQCJA ARaD6lW9UAGsRrk3Iem5DKenAL7Ux1uu1fo54uVrxbUjnv2F6TnR3FznheMafwKeUyFv yebRnlIk65Me7gdLxtG58MZERMxXNZ0vyUlBf5GLMaZq8rzRdWG3JjE7rF/Ryij877kR 0O/w== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:list-unsubscribe:list-subscribe :list-id:precedence:dkim-signature; bh=P0oFG8bq7yHSKq/amXLj6YkSDfn5J6Mv84WGLZXEbSo=; fh=PXQF1pCjVeLw0G0yHB/LSrkpvc5lczH7IpJ1OfT8tOk=; b=PFaZXpnCA1rjKfQ1zNg75NCFamU2aOjoaW6zUN3Pm5hmGqZn2FdacZX5kNmbFasEaC 7XuFJVt8JTWwiqoHLAm7XZqRGEcWyjUnmgNLg8golB/WCzEsdXVVwVkbEDJloNjPiNch w5ztC3O5mjDNeuhh2WqCFYI8EaIogZ1VRHK6bYSXk0VIsXmvvIKeArpvXprk38hTZFAS ck/ZtJcpY6baJtujexwYWMYm0CDAtPeOVLzN+Ot/rNDRSor0PXnV0/FB8PXVtjcGVrO6 WVXzI7fJS3EbqaZx545h5MgsAB2sdCntpGgPCk1kdm5Rm5kxXCmn+WQE3A8Cj2oCpmS1 vVeQ==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b=e2SUbbKj; arc=pass (i=1 spf=pass spfdomain=chromium.org dkim=pass dkdomain=chromium.org dmarc=pass fromdomain=chromium.org); spf=pass (google.com: domain of linux-kernel+bounces-203788-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-203788-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Return-Path: Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org. [147.75.48.161]) by mx.google.com with ESMTPS id 41be03b00d2f7-6de294faa04si79223a12.698.2024.06.06.01.11.40 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 06 Jun 2024 01:11:41 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-203788-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) client-ip=147.75.48.161; Authentication-Results: mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b=e2SUbbKj; arc=pass (i=1 spf=pass spfdomain=chromium.org dkim=pass dkdomain=chromium.org dmarc=pass fromdomain=chromium.org); spf=pass (google.com: domain of linux-kernel+bounces-203788-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-203788-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sy.mirrors.kernel.org (Postfix) with ESMTPS id 1A83AB216F0 for ; Thu, 6 Jun 2024 08:04:58 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 0F62C13BC3F; Thu, 6 Jun 2024 08:04:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=chromium.org header.i=@chromium.org header.b="e2SUbbKj" Received: from mail-lj1-f174.google.com (mail-lj1-f174.google.com [209.85.208.174]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AA71C7346A for ; Thu, 6 Jun 2024 08:04:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.208.174 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717661085; cv=none; b=rocO3s2Ch0JGohZgrESYh2um2iFX7e6yFpIOEwVHbx/LP25V1XNRq28hoxfrqckZ/ZPotHd4ROP9lFVKlUb5GpHZtL9Adg5ALUKKgjJICEozyoFKoQsEBkL5pCjgB35A4s7B3fAZyiCVrH3grM1JYHFRVoVcGOBuZmbIo1U8KsE= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717661085; c=relaxed/simple; bh=g4c5psevZ4KTGnwayQMHLRHOtg2KtKX6R0GxlXxG5WI=; h=MIME-Version:References:In-Reply-To:From:Date:Message-ID:Subject: To:Cc:Content-Type; b=onxnSjFzthfr7hx5sMo1bZyYVSZNgznnrQgFYSv7XbJxhUJ9n+vXzckHXy+iwr08V6L4Jly0svowmbURMQ7OfZrt9FlWWiSCi4fs74DKgdyOidC5iKrM+i6ku469bbFgGI5irbsDAQczVfXvU7ZqA8BJnzxoCppmFQ6V9JtFLhQ= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=chromium.org; spf=pass smtp.mailfrom=chromium.org; dkim=pass (1024-bit key) header.d=chromium.org header.i=@chromium.org header.b=e2SUbbKj; arc=none smtp.client-ip=209.85.208.174 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=chromium.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=chromium.org Received: by mail-lj1-f174.google.com with SMTP id 38308e7fff4ca-2ead2c6b553so6203991fa.0 for ; Thu, 06 Jun 2024 01:04:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; t=1717661081; x=1718265881; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=P0oFG8bq7yHSKq/amXLj6YkSDfn5J6Mv84WGLZXEbSo=; b=e2SUbbKjihiAmgJzaV6cUHf2t4paIHZFcUB8y8xc9OH3Pr4KAoPCAWRlA8wbxHHa8T sNqeL68TsMcWwS7I7EBLW7jUcwV+9YdqHSNPRXFXOMR644FRNmVHxPvbYKpXwIo4A63M Ey5U5kwDr+2NgqXNCou3+akqcy/Z/WYNAx6tg= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717661081; x=1718265881; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=P0oFG8bq7yHSKq/amXLj6YkSDfn5J6Mv84WGLZXEbSo=; b=HqJawKKaQPxMz1XEDwJKWhrhvtqTv0W5dihAv4E/qXPp7HP7xPOIUM9CB6tugflRqQ yggZTncgmWnt0gj4gS/20C+qsrHg1PozRVu7mo0+oB17F/ioHczGN8igGHj2HWhoaeFe T0+Wchl+gb8hzizQGVJo5irGM7ABeVOA8/DqF2iSPqrvKnt2P6RTZpkGfxNy0JJlPfv0 HPY02QvyWqJK6xuxbRgd36ZcERGsvZ4Qc4N/lyTL50Rv9XO/BlNiO6LSZMwg3VoLOzjH rH2bNg9LKR+QPT/SSSa394RL2nH03ByJFazrMJFUchn6zUEFvW1wcPAQQoTtbgL7XJ8T 7KpQ== X-Forwarded-Encrypted: i=1; AJvYcCU0Uif1ZgQPVSdGtoHIeTrww9UjkXTbfitaSOi721rwK9qa6P021lgZ2eDcWAkJdTIk+6cmn5sO2NGCnSTDCCub2ScTu/w/ZCmSU3M8 X-Gm-Message-State: AOJu0YwflYJypR6CEMeiq5Gq9oxkdP8KgpOqgi7isg7t8WxYVciYMwql hDewiES8jdLG8+ucj4mXRV0vfO/EpDSDM+3LfHLZFeCnyBNN2g2m/SuFpkVWn+4PFLGccbNT7n0 = X-Received: by 2002:a2e:86c6:0:b0:2da:7944:9547 with SMTP id 38308e7fff4ca-2eac79bfef0mr36857281fa.5.1717661081417; Thu, 06 Jun 2024 01:04:41 -0700 (PDT) Received: from mail-ej1-f42.google.com (mail-ej1-f42.google.com. [209.85.218.42]) by smtp.gmail.com with ESMTPSA id 4fb4d7f45d1cf-57aae201a85sm653160a12.72.2024.06.06.01.04.40 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 06 Jun 2024 01:04:40 -0700 (PDT) Received: by mail-ej1-f42.google.com with SMTP id a640c23a62f3a-a68a9a4e9a6so64874366b.3 for ; Thu, 06 Jun 2024 01:04:40 -0700 (PDT) X-Forwarded-Encrypted: i=1; AJvYcCXGcX1msPF9dZOjgOFN0lzgiZTiHukCXICWASZdyP/yomOe0SQ5G9eCF7PhqmQYabMvUpKmiu12uML3hxFC8rdz2jTqDllhWZg9CjnK X-Received: by 2002:a17:906:c945:b0:a68:b839:485a with SMTP id a640c23a62f3a-a69a002ba4amr345395366b.77.1717661078546; Thu, 06 Jun 2024 01:04:38 -0700 (PDT) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 References: <20230518072657.1.If9539da710217ed92e764cc0ba0f3d2d246a1aee@changeid> <5184214b-22ed-41cf-a1b0-6db2d4ff324c@gmx.net> In-Reply-To: <5184214b-22ed-41cf-a1b0-6db2d4ff324c@gmx.net> From: Ying Hsu Date: Thu, 6 Jun 2024 16:03:59 +0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH] igb: Fix igb_down hung on surprise removal To: Stefan Schaeckeler Cc: netdev@vger.kernel.org, grundler@chromium.org, "David S. Miller" , Eric Dumazet , Jakub Kicinski , Jesse Brandeburg , Paolo Abeni , Tony Nguyen , intel-wired-lan@lists.osuosl.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On the CalDigit Thunderbolt Station 3 Plus, we've encountered an issue when the USB downstream display connection state changes. The problematic sequence observed is: ``` igb_io_error_detected igb_down igb_io_error_detected igb_down ``` The second igb_down call blocks at napi_synchronize. Simply avoiding redundant igb_down calls makes the Ethernet of the thunderbolt dock unusable. If Intel can identify when an Ethernet device is within a Thunderbolt tunnel, the patch can be more specific. On Thu, Jun 6, 2024 at 8:04=E2=80=AFAM Stefan Schaeckeler wrote: > > Hi all, > > This commit introduced a regression. What does not work with this commit = is an AER without a surprise removal event. > > > On 5/18/23 00:26, Ying Hsu wrote: > > In a setup where a Thunderbolt hub connects to Ethernet and a display > > through USB Type-C, users may experience a hung task timeout when they > > remove the cable between the PC and the Thunderbolt hub. > > This is because the igb_down function is called multiple times when > > the Thunderbolt hub is unplugged. For example, the igb_io_error_detecte= d > > triggers the first call, and the igb_remove triggers the second call. > > The second call to igb_down will block at napi_synchronize. > > Here's the call trace: > > __schedule+0x3b0/0xddb > > ? __mod_timer+0x164/0x5d3 > > schedule+0x44/0xa8 > > schedule_timeout+0xb2/0x2a4 > > ? run_local_timers+0x4e/0x4e > > msleep+0x31/0x38 > > igb_down+0x12c/0x22a [igb 6615058754948bfde0bf01429257eb59f13030d4] > > __igb_close+0x6f/0x9c [igb 6615058754948bfde0bf01429257eb59f13030d4= ] > > igb_close+0x23/0x2b [igb 6615058754948bfde0bf01429257eb59f13030d4] > > __dev_close_many+0x95/0xec > > dev_close_many+0x6e/0x103 > > unregister_netdevice_many+0x105/0x5b1 > > unregister_netdevice_queue+0xc2/0x10d > > unregister_netdev+0x1c/0x23 > > igb_remove+0xa7/0x11c [igb 6615058754948bfde0bf01429257eb59f13030d4= ] > > pci_device_remove+0x3f/0x9c > > device_release_driver_internal+0xfe/0x1b4 > > pci_stop_bus_device+0x5b/0x7f > > pci_stop_bus_device+0x30/0x7f > > pci_stop_bus_device+0x30/0x7f > > pci_stop_and_remove_bus_device+0x12/0x19 > > pciehp_unconfigure_device+0x76/0xe9 > > pciehp_disable_slot+0x6e/0x131 > > pciehp_handle_presence_or_link_change+0x7a/0x3f7 > > pciehp_ist+0xbe/0x194 > > irq_thread_fn+0x22/0x4d > > ? irq_thread+0x1fd/0x1fd > > irq_thread+0x17b/0x1fd > > ? irq_forced_thread_fn+0x5f/0x5f > > kthread+0x142/0x153 > > ? __irq_get_irqchip_state+0x46/0x46 > > ? kthread_associate_blkcg+0x71/0x71 > > ret_from_fork+0x1f/0x30 > > > > In this case, igb_io_error_detected detaches the network interface > > and requests a PCIE slot reset, however, the PCIE reset callback is > > not being invoked and thus the Ethernet connection breaks down. > > As the PCIE error in this case is a non-fatal one, requesting a > > slot reset can be avoided. > > This patch fixes the task hung issue and preserves Ethernet > > connection by ignoring non-fatal PCIE errors. > > > > Signed-off-by: Ying Hsu > > --- > > This commit has been tested on a HP Elite Dragonfly Chromebook and > > a Caldigit TS3+ Thunderbolt hub. The Ethernet driver for the hub > > is igb. Non-fatal PCIE errors happen when users hot-plug the cables > > connected to the chromebook or to the external display. > > > > drivers/net/ethernet/intel/igb/igb_main.c | 5 +++++ > > 1 file changed, 5 insertions(+) > > > > diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/et= hernet/intel/igb/igb_main.c > > index 58872a4c2540..a8b217368ca1 100644 > > --- a/drivers/net/ethernet/intel/igb/igb_main.c > > +++ b/drivers/net/ethernet/intel/igb/igb_main.c > > @@ -9581,6 +9581,11 @@ static pci_ers_result_t igb_io_error_detected(st= ruct pci_dev *pdev, > > struct net_device *netdev =3D pci_get_drvdata(pdev); > > struct igb_adapter *adapter =3D netdev_priv(netdev); > > > > + if (state =3D=3D pci_channel_io_normal) { > > + dev_warn(&pdev->dev, "Non-correctable non-fatal error rep= orted.\n"); > > + return PCI_ERS_RESULT_CAN_RECOVER; > > + } > > + > > netif_device_detach(netdev); > > > > if (state =3D=3D pci_channel_io_perm_failure) > > We are currently stuck with the 5.4 kernel - our embedded system can't ea= sily boot arbitrary kernel versions. The igb driver code has not changed an= d I'm quite positive the issue still persist in the latest upstream kernel. > > This issue is reproducible with aer_inject. 09:00.0 is our i210 which is = directly connected to the cpu root port 00:01.1: > > > - - - snip - - - > [node0_RP0_CPU0:~]$cat aer.log > AER > PCI_ID 09:00.0 > UNCOR_STATUS COMP_TIME > HEADER_LOG 0 1 2 3 > - - - snip - - - > > - - - snip - - - > [node0_RP0_CPU0:~]$aer-inject aer.log > [369145.900845] pcieport 0000:00:01.1: aer_inject: Injecting errors 00000= 000/00004000 into device 0000:09:00.0 > [369145.912726] pcieport 0000:00:01.1: AER: Uncorrected (Non-Fatal) error= received: 0000:09:00.0 > [369145.923124] igb 0000:09:00.0: AER: PCIe Bus Error: severity=3DUncorre= cted (Non-Fatal), type=3DTransaction Layer, (Requester ID) > [369145.936791] igb 0000:09:00.0: AER: device [8086:1537] error status/= mask=3D00004000/00000000 > [369145.947068] igb 0000:09:00.0: AER: [14] CmpltTO > [369145.954602] igb 0000:09:00.0: Non-correctable non-fatal error reporte= d. > [369145.984564] ------------[ cut here ]------------ > [369145.990285] kernel BUG at include/linux/netdevice.h:529! > [369145.996860] invalid opcode: 0000 [#1] SMP PTI > [369146.002267] CPU: 3 PID: 142 Comm: irq/26-aerdrv Kdump: loaded Tainted= : G O 5.4.251-yocto-standard #1 > [369146.015073] Hardware name: Cisco System Inc. SF-D8/Type2 - Board Prod= uct Name1, BIOS 1-29-g46d9e72a-s 05/03/2019 > [369146.027570] RIP: 0010:igb_up+0x51/0x160 [igb] > [369146.032974] Code: d2 eb 16 f0 80 60 60 fe f0 80 60 60 f7 48 83 c2 01 = 39 93 14 02 00 00 76 13 48 8b 84 d3 08 0f 00 00 48 8b 48 60 83 e1 01 75 d9 = <0f> 0b f6 83 11 02 00 00 20 0f 85 c0 00 00 00 48 8b bb 08 0f 00 00 > [369146.055938] RSP: 0018:ffffb29b0045bcf0 EFLAGS: 00010246 > [369146.062399] RAX: ffff8d938d99e400 RBX: ffff8d9398a08740 RCX: 00000000= 00000000 > [369146.071186] RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffff8d93= 8d99e2c0 > [369146.079974] RBP: ffffb29b0045bd00 R08: 00000000000002ec R09: ffffb29b= 002fa000 > [369146.088761] R10: 0000000000006f80 R11: 000000000090ef2c R12: ffff8d93= 98a08000 > [369146.097547] R13: ffff8d9398a08740 R14: ffff8d939ae92c00 R15: ffff8d93= 9ae92c28 > [369146.106335] FS: 0000000000000000(0000) GS:ffff8d939fcc0000(0000) knl= GS:0000000000000000 > [369146.116286] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [369146.123381] CR2: 000056318eb418e8 CR3: 000000084e744002 CR4: 00000000= 003606e0 > [369146.132170] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 00000000= 00000000 > [369146.140960] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 00000000= 00000400 > [369146.149747] Call Trace: > [369146.152826] ? show_regs.cold+0x1a/0x1f > [369146.157592] ? __die+0x90/0xd9 > [369146.161408] ? die+0x30/0x50 > [369146.165002] ? do_trap+0x85/0xf0 > [369146.169029] ? do_error_trap+0x7c/0xb0 > [369146.173695] ? igb_up+0x51/0x160 [igb] > [369146.178355] ? do_invalid_op+0x3c/0x50 > [369146.183019] ? igb_up+0x51/0x160 [igb] > [369146.187681] ? invalid_op+0x7c/0x90 > [369146.192027] ? igb_up+0x51/0x160 [igb] > [369146.196691] ? hrtimer_try_to_cancel+0x2c/0x110 > [369146.202305] ? schedule_hrtimeout_range_clock+0xa0/0x110 > [369146.208871] ? hrtimer_init_sleeper+0x90/0x90 > [369146.214277] ? igb_rx_fifo_flush_82575+0x32/0x270 [igb] > [369146.220738] ? igb_configure+0x417/0x650 [igb] > [369146.226247] ? igb_up+0x51/0x160 [igb] > [369146.230913] igb_io_resume+0x31/0x50 [igb] > [369146.235998] report_resume+0x5c/0x80 > [369146.240449] ? pcie_portdrv_probe+0x70/0x70 > [369146.245631] pci_walk_bus+0x75/0x90 > [369146.249966] pcie_do_recovery+0x163/0x280 > [369146.254947] aer_process_err_devices+0xa2/0xd1 > [369146.260455] aer_isr.cold+0x52/0xa1 > [369146.264799] ? __schedule+0x2bf/0x680 > [369146.269349] ? irq_finalize_oneshot+0xf0/0xf0 > [369146.274754] irq_thread_fn+0x28/0x50 > [369146.279205] irq_thread+0xf8/0x180 > [369146.283445] ? wake_threads_waitq+0x30/0x30 > [369146.288638] kthread+0x104/0x140 > [369146.292666] ? irq_thread_check_affinity+0x80/0x80 > [369146.298597] ? __kthread_cancel_work+0x40/0x40 > [369146.304106] ret_from_fork+0x35/0x40 > - - - snip - - - > > BUG_ON() comes from > > - - - snip - - - > static inline void napi_enable(struct napi_struct *n) > { > BUG_ON(!test_bit(NAPI_STATE_SCHED, &n->state)); > smp_mb__before_atomic(); > clear_bit(NAPI_STATE_SCHED, &n->state); > clear_bit(NAPI_STATE_NPSVC, &n->state); > } > - - - snip - - - > > The stack-trace shows that the AER handler starts here and goes into igb_= up(): > > - - - snip - - - > static void igb_io_resume(struct pci_dev *pdev) > { > struct net_device *netdev =3D pci_get_drvdata(pdev); > struct igb_adapter *adapter =3D netdev_priv(netdev); > > if (netif_running(netdev)) { > if (igb_up(adapter)) { > dev_err(&pdev->dev, "igb_up failed after reset\n"= ); > return; > } > } > > ... > } > - - - snip - - - > Three functions come into the picture: > > igb_io_error_detected() { // runs upon AER detection > igb_down(); // before this commit > noop; // with this commit > } > > igb_io_resume() { // runs upon AER handling > igb_up() > } > > igb_remove() { // runs upon rmmod igb, or surprise down removal (as shown= in the commit log) > igb_down(); > } > > Before this commit, the flow for an AER on 09:00.0 was > igb_down; // from igb_io_error_detect() - this sets the NAPI_STATE_SCHED= bit > igb_up; // from igb_io_resume() - that works as it finds a set N= API_STATE_SCHED bit (and then clears it) > > Now with this commit, the flow for an AER on 09:00.0 is > noop; // from igb_io_error_detect() - the NAPI_STATE_SCHED bit is no= t touched, e.g. it remains cleared > igb_up; // from igb_io_resume() - BUG_ON() is triggered as it fi= nds a cleared NAPI_STATE_SCHED bit > > > I don't have a means to reproduce this i210 on thunderbolt issue and don'= t completely understand its flow. Before this commit, the "expected" flow f= or i210 on thunderbolt was probably > > igb_down; // from igb_io_error_detect() - this sets the NAPI_STATE_SCHED= bit > igb_up; // from igb_io_resume() - that works as it finds a set N= API_STATE_SCHED bit and cleans it. > igb_down; // from igb_remove() - that works b/c of the previous= igb_up() > > The bug report shows that this was not the case. I don't completely under= stand the reported problem: what happens to and in igb_io_resume() and its = call to igb_up()? In the commit log backtrace, I see igb_remove() eventuall= y calling __dev_close_many() which clears the __LINK__STATE_START bit. igb_= io_resume() does check this bit via netif_running() and calls only then igb= _up(). My guess is igb_io_resume() executes after igb_remove() and skips th= erefore the execution of igb_up(). As we see from the commit log backtrace,= that makes igb_remove() starve now in igb_down()->napi_synchronize(). > > How to fix that, I don't know. > > Stefan > >