2023-05-18 07:38:54

by Ying Hsu

[permalink] [raw]
Subject: [PATCH] igb: Fix igb_down hung on surprise removal

In a setup where a Thunderbolt hub connects to Ethernet and a display
through USB Type-C, users may experience a hung task timeout when they
remove the cable between the PC and the Thunderbolt hub.
This is because the igb_down function is called multiple times when
the Thunderbolt hub is unplugged. For example, the igb_io_error_detected
triggers the first call, and the igb_remove triggers the second call.
The second call to igb_down will block at napi_synchronize.
Here's the call trace:
__schedule+0x3b0/0xddb
? __mod_timer+0x164/0x5d3
schedule+0x44/0xa8
schedule_timeout+0xb2/0x2a4
? run_local_timers+0x4e/0x4e
msleep+0x31/0x38
igb_down+0x12c/0x22a [igb 6615058754948bfde0bf01429257eb59f13030d4]
__igb_close+0x6f/0x9c [igb 6615058754948bfde0bf01429257eb59f13030d4]
igb_close+0x23/0x2b [igb 6615058754948bfde0bf01429257eb59f13030d4]
__dev_close_many+0x95/0xec
dev_close_many+0x6e/0x103
unregister_netdevice_many+0x105/0x5b1
unregister_netdevice_queue+0xc2/0x10d
unregister_netdev+0x1c/0x23
igb_remove+0xa7/0x11c [igb 6615058754948bfde0bf01429257eb59f13030d4]
pci_device_remove+0x3f/0x9c
device_release_driver_internal+0xfe/0x1b4
pci_stop_bus_device+0x5b/0x7f
pci_stop_bus_device+0x30/0x7f
pci_stop_bus_device+0x30/0x7f
pci_stop_and_remove_bus_device+0x12/0x19
pciehp_unconfigure_device+0x76/0xe9
pciehp_disable_slot+0x6e/0x131
pciehp_handle_presence_or_link_change+0x7a/0x3f7
pciehp_ist+0xbe/0x194
irq_thread_fn+0x22/0x4d
? irq_thread+0x1fd/0x1fd
irq_thread+0x17b/0x1fd
? irq_forced_thread_fn+0x5f/0x5f
kthread+0x142/0x153
? __irq_get_irqchip_state+0x46/0x46
? kthread_associate_blkcg+0x71/0x71
ret_from_fork+0x1f/0x30

In this case, igb_io_error_detected detaches the network interface
and requests a PCIE slot reset, however, the PCIE reset callback is
not being invoked and thus the Ethernet connection breaks down.
As the PCIE error in this case is a non-fatal one, requesting a
slot reset can be avoided.
This patch fixes the task hung issue and preserves Ethernet
connection by ignoring non-fatal PCIE errors.

Signed-off-by: Ying Hsu <[email protected]>
---
This commit has been tested on a HP Elite Dragonfly Chromebook and
a Caldigit TS3+ Thunderbolt hub. The Ethernet driver for the hub
is igb. Non-fatal PCIE errors happen when users hot-plug the cables
connected to the chromebook or to the external display.

drivers/net/ethernet/intel/igb/igb_main.c | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 58872a4c2540..a8b217368ca1 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -9581,6 +9581,11 @@ static pci_ers_result_t igb_io_error_detected(struct pci_dev *pdev,
struct net_device *netdev = pci_get_drvdata(pdev);
struct igb_adapter *adapter = netdev_priv(netdev);

+ if (state == pci_channel_io_normal) {
+ dev_warn(&pdev->dev, "Non-correctable non-fatal error reported.\n");
+ return PCI_ERS_RESULT_CAN_RECOVER;
+ }
+
netif_device_detach(netdev);

if (state == pci_channel_io_perm_failure)
--
2.40.1.606.ga4b1b128d6-goog



2023-05-18 10:49:53

by Pavan Chebbi

[permalink] [raw]
Subject: Re: [PATCH] igb: Fix igb_down hung on surprise removal

On Thu, May 18, 2023 at 12:58 PM Ying Hsu <[email protected]> wrote:
>
> diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
> index 58872a4c2540..a8b217368ca1 100644
> --- a/drivers/net/ethernet/intel/igb/igb_main.c
> +++ b/drivers/net/ethernet/intel/igb/igb_main.c
> @@ -9581,6 +9581,11 @@ static pci_ers_result_t igb_io_error_detected(struct pci_dev *pdev,
> struct net_device *netdev = pci_get_drvdata(pdev);
> struct igb_adapter *adapter = netdev_priv(netdev);
>
> + if (state == pci_channel_io_normal) {
> + dev_warn(&pdev->dev, "Non-correctable non-fatal error reported.\n");
> + return PCI_ERS_RESULT_CAN_RECOVER;
> + }
> +

This code may be good to have. But not sure if this should be the fix
for igb_down() synchronization.
Intel guys may comment.

> netif_device_detach(netdev);
>
> if (state == pci_channel_io_perm_failure)
> --
> 2.40.1.606.ga4b1b128d6-goog
>
>


Attachments:
smime.p7s (4.11 kB)
S/MIME Cryptographic Signature

2023-05-22 20:29:54

by Grant Grundler

[permalink] [raw]
Subject: Re: [PATCH] igb: Fix igb_down hung on surprise removal

On Thu, May 18, 2023 at 3:36 AM Pavan Chebbi <[email protected]> wrote:
>
> On Thu, May 18, 2023 at 12:58 PM Ying Hsu <[email protected]> wrote:
> >
> > diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
> > index 58872a4c2540..a8b217368ca1 100644
> > --- a/drivers/net/ethernet/intel/igb/igb_main.c
> > +++ b/drivers/net/ethernet/intel/igb/igb_main.c
> > @@ -9581,6 +9581,11 @@ static pci_ers_result_t igb_io_error_detected(struct pci_dev *pdev,
> > struct net_device *netdev = pci_get_drvdata(pdev);
> > struct igb_adapter *adapter = netdev_priv(netdev);
> >
> > + if (state == pci_channel_io_normal) {
> > + dev_warn(&pdev->dev, "Non-correctable non-fatal error reported.\n");
> > + return PCI_ERS_RESULT_CAN_RECOVER;
> > + }
> > +
>
> This code may be good to have. But not sure if this should be the fix
> for igb_down() synchronization.

I have the same opinion. This appears to solve the problem - but I
don't know if there is a better way to solve this problem.

> Intel guys may comment.

Ping? Can we please get feedback from IGB/IGC maintainers this week?

(I hope igc maintainers can confirm this isn't an issue for igc.)

cheers,
grant

>
> > netif_device_detach(netdev);
> >
> > if (state == pci_channel_io_perm_failure)
> > --
> > 2.40.1.606.ga4b1b128d6-goog
> >
> >

2023-05-23 18:19:01

by Tony Nguyen

[permalink] [raw]
Subject: Re: [PATCH] igb: Fix igb_down hung on surprise removal

On 5/22/2023 1:16 PM, Grant Grundler wrote:
> On Thu, May 18, 2023 at 3:36 AM Pavan Chebbi <[email protected]> wrote:
>>
>> On Thu, May 18, 2023 at 12:58 PM Ying Hsu <[email protected]> wrote:
>>>
>>> diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
>>> index 58872a4c2540..a8b217368ca1 100644
>>> --- a/drivers/net/ethernet/intel/igb/igb_main.c
>>> +++ b/drivers/net/ethernet/intel/igb/igb_main.c
>>> @@ -9581,6 +9581,11 @@ static pci_ers_result_t igb_io_error_detected(struct pci_dev *pdev,
>>> struct net_device *netdev = pci_get_drvdata(pdev);
>>> struct igb_adapter *adapter = netdev_priv(netdev);
>>>
>>> + if (state == pci_channel_io_normal) {
>>> + dev_warn(&pdev->dev, "Non-correctable non-fatal error reported.\n");
>>> + return PCI_ERS_RESULT_CAN_RECOVER;
>>> + }
>>> +
>>
>> This code may be good to have. But not sure if this should be the fix
>> for igb_down() synchronization.
>
> I have the same opinion. This appears to solve the problem - but I
> don't know if there is a better way to solve this problem.
>
>> Intel guys may comment.
>
> Ping? Can we please get feedback from IGB/IGC maintainers this week?
>
> (I hope igc maintainers can confirm this isn't an issue for igc.)

Adding some of the igb and igc developers.

> cheers,
> grant
>
>>
>>> netif_device_detach(netdev);
>>>
>>> if (state == pci_channel_io_perm_failure)
>>> --
>>> 2.40.1.606.ga4b1b128d6-goog
>>>
>>>

2023-05-24 12:47:01

by Loktionov, Aleksandr

[permalink] [raw]
Subject: RE: [PATCH] igb: Fix igb_down hung on surprise removal

Good day Tony

We reviewed the patch and have nothing against.

With the best regards
Alex
ND ITP Linux 40G base driver TL



> -----Original Message-----
> From: Nguyen, Anthony L <[email protected]>
> Sent: Tuesday, May 23, 2023 8:04 PM
> To: Grant Grundler <[email protected]>; Pavan Chebbi
> <[email protected]>; Loktionov, Aleksandr
> <[email protected]>; Neftin, Sasha <[email protected]>;
> Ruinskiy, Dima <[email protected]>
> Cc: Ying Hsu <[email protected]>; [email protected]; David S.
> Miller <[email protected]>; Eric Dumazet <[email protected]>;
> Jakub Kicinski <[email protected]>; Brandeburg, Jesse
> <[email protected]>; Paolo Abeni <[email protected]>; intel-
> [email protected]; [email protected]
> Subject: Re: [PATCH] igb: Fix igb_down hung on surprise removal
>
> On 5/22/2023 1:16 PM, Grant Grundler wrote:
> > On Thu, May 18, 2023 at 3:36 AM Pavan Chebbi
> <[email protected]> wrote:
> >>
> >> On Thu, May 18, 2023 at 12:58 PM Ying Hsu <[email protected]>
> wrote:
> >>>
> >>> diff --git a/drivers/net/ethernet/intel/igb/igb_main.c
> >>> b/drivers/net/ethernet/intel/igb/igb_main.c
> >>> index 58872a4c2540..a8b217368ca1 100644
> >>> --- a/drivers/net/ethernet/intel/igb/igb_main.c
> >>> +++ b/drivers/net/ethernet/intel/igb/igb_main.c
> >>> @@ -9581,6 +9581,11 @@ static pci_ers_result_t
> igb_io_error_detected(struct pci_dev *pdev,
> >>> struct net_device *netdev = pci_get_drvdata(pdev);
> >>> struct igb_adapter *adapter = netdev_priv(netdev);
> >>>
> >>> + if (state == pci_channel_io_normal) {
> >>> + dev_warn(&pdev->dev, "Non-correctable non-fatal error
> reported.\n");
> >>> + return PCI_ERS_RESULT_CAN_RECOVER;
> >>> + }
> >>> +
> >>
> >> This code may be good to have. But not sure if this should be the fix
> >> for igb_down() synchronization.
> >
> > I have the same opinion. This appears to solve the problem - but I
> > don't know if there is a better way to solve this problem.
> >
> >> Intel guys may comment.
> >
> > Ping? Can we please get feedback from IGB/IGC maintainers this week?
> >
> > (I hope igc maintainers can confirm this isn't an issue for igc.)
>
> Adding some of the igb and igc developers.
>
> > cheers,
> > grant
> >
> >>
> >>> netif_device_detach(netdev);
> >>>
> >>> if (state == pci_channel_io_perm_failure)
> >>> --
> >>> 2.40.1.606.ga4b1b128d6-goog
> >>>
> >>>

2023-05-24 21:13:49

by Grant Grundler

[permalink] [raw]
Subject: Re: [PATCH] igb: Fix igb_down hung on surprise removal

On Wed, May 24, 2023 at 5:34 AM Loktionov, Aleksandr
<[email protected]> wrote:
>
> Good day Tony
>
> We reviewed the patch and have nothing against.

Thank you for reviewing!

Can I take this as the equivalent of "Signed-off-by: Loktionov,
Aleksandr <[email protected]>"?

Or since Tony is listed in MAINTAINERS for drivers/net/ethernet/intel,
is he supposed to provide that?

cheers,
grant

>
> With the best regards
> Alex
> ND ITP Linux 40G base driver TL
>
>
>
> > -----Original Message-----
> > From: Nguyen, Anthony L <[email protected]>
> > Sent: Tuesday, May 23, 2023 8:04 PM
> > To: Grant Grundler <[email protected]>; Pavan Chebbi
> > <[email protected]>; Loktionov, Aleksandr
> > <[email protected]>; Neftin, Sasha <[email protected]>;
> > Ruinskiy, Dima <[email protected]>
> > Cc: Ying Hsu <[email protected]>; [email protected]; David S.
> > Miller <[email protected]>; Eric Dumazet <[email protected]>;
> > Jakub Kicinski <[email protected]>; Brandeburg, Jesse
> > <[email protected]>; Paolo Abeni <[email protected]>; intel-
> > [email protected]; [email protected]
> > Subject: Re: [PATCH] igb: Fix igb_down hung on surprise removal
> >
> > On 5/22/2023 1:16 PM, Grant Grundler wrote:
> > > On Thu, May 18, 2023 at 3:36 AM Pavan Chebbi
> > <[email protected]> wrote:
> > >>
> > >> On Thu, May 18, 2023 at 12:58 PM Ying Hsu <[email protected]>
> > wrote:
> > >>>
> > >>> diff --git a/drivers/net/ethernet/intel/igb/igb_main.c
> > >>> b/drivers/net/ethernet/intel/igb/igb_main.c
> > >>> index 58872a4c2540..a8b217368ca1 100644
> > >>> --- a/drivers/net/ethernet/intel/igb/igb_main.c
> > >>> +++ b/drivers/net/ethernet/intel/igb/igb_main.c
> > >>> @@ -9581,6 +9581,11 @@ static pci_ers_result_t
> > igb_io_error_detected(struct pci_dev *pdev,
> > >>> struct net_device *netdev = pci_get_drvdata(pdev);
> > >>> struct igb_adapter *adapter = netdev_priv(netdev);
> > >>>
> > >>> + if (state == pci_channel_io_normal) {
> > >>> + dev_warn(&pdev->dev, "Non-correctable non-fatal error
> > reported.\n");
> > >>> + return PCI_ERS_RESULT_CAN_RECOVER;
> > >>> + }
> > >>> +
> > >>
> > >> This code may be good to have. But not sure if this should be the fix
> > >> for igb_down() synchronization.
> > >
> > > I have the same opinion. This appears to solve the problem - but I
> > > don't know if there is a better way to solve this problem.
> > >
> > >> Intel guys may comment.
> > >
> > > Ping? Can we please get feedback from IGB/IGC maintainers this week?
> > >
> > > (I hope igc maintainers can confirm this isn't an issue for igc.)
> >
> > Adding some of the igb and igc developers.
> >
> > > cheers,
> > > grant
> > >
> > >>
> > >>> netif_device_detach(netdev);
> > >>>
> > >>> if (state == pci_channel_io_perm_failure)
> > >>> --
> > >>> 2.40.1.606.ga4b1b128d6-goog
> > >>>
> > >>>

2023-05-24 23:01:09

by Grant Grundler

[permalink] [raw]
Subject: Re: [PATCH] igb: Fix igb_down hung on surprise removal

On Wed, May 24, 2023 at 3:22 PM Tony Nguyen <[email protected]> wrote:
>
> Hi Grant,
>
> On 5/24/2023 2:01 PM, Grant Grundler wrote:
> > On Wed, May 24, 2023 at 5:34 AM Loktionov, Aleksandr
> > <[email protected]> wrote:
> >>
> >> Good day Tony
> >>
> >> We reviewed the patch and have nothing against.
> >
> > Thank you for reviewing!
> >
> > Can I take this as the equivalent of "Signed-off-by: Loktionov,
> > Aleksandr <[email protected]>"?
>
> Unless a tag is explicitly given, I don't believe one can be inferred.

Yes - that's what I thought and was asking in case that's what
Aleksandr meant (and could easily confirm)

>
> > Or since Tony is listed in MAINTAINERS for drivers/net/ethernet/intel,
> > is he supposed to provide that?
>
> Assuming there's no comments/issues brought up, I'll apply it to the
> respective Intel Wired LAN tree for our validation to have a pass at it.
> Upon successful completion, I'll send the patch on to netdev for them to
> pull. Hope that helps.

Yes - that's what I needed to know. Thank you Tony! :)

Ying Hsu will apply this patch to Chrome OS kernels:
https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/4548800

cheers,
grant

>
> Thanks,
> Tony

2023-05-24 23:10:52

by Tony Nguyen

[permalink] [raw]
Subject: Re: [PATCH] igb: Fix igb_down hung on surprise removal

Hi Grant,

On 5/24/2023 2:01 PM, Grant Grundler wrote:
> On Wed, May 24, 2023 at 5:34 AM Loktionov, Aleksandr
> <[email protected]> wrote:
>>
>> Good day Tony
>>
>> We reviewed the patch and have nothing against.
>
> Thank you for reviewing!
>
> Can I take this as the equivalent of "Signed-off-by: Loktionov,
> Aleksandr <[email protected]>"?

Unless a tag is explicitly given, I don't believe one can be inferred.

> Or since Tony is listed in MAINTAINERS for drivers/net/ethernet/intel,
> is he supposed to provide that?

Assuming there's no comments/issues brought up, I'll apply it to the
respective Intel Wired LAN tree for our validation to have a pass at it.
Upon successful completion, I'll send the patch on to netdev for them to
pull. Hope that helps.

Thanks,
Tony

2023-06-05 05:12:54

by Pucha, HimasekharX Reddy

[permalink] [raw]
Subject: RE: [Intel-wired-lan] [PATCH] igb: Fix igb_down hung on surprise removal

> -----Original Message-----
> From: Intel-wired-lan <[email protected]> On Behalf Of Ying Hsu
> Sent: Thursday, May 18, 2023 12:57 PM
> To: [email protected]
> Cc: [email protected]; [email protected]; Ying Hsu <[email protected]>; Brandeburg, Jesse <[email protected]>; [email protected]; Eric Dumazet <[email protected]>; Nguyen, Anthony L <[email protected]>; Jakub Kicinski <[email protected]>; Paolo Abeni <[email protected]>; David S. Miller <[email protected]>
> Subject: [Intel-wired-lan] [PATCH] igb: Fix igb_down hung on surprise removal
>
> In a setup where a Thunderbolt hub connects to Ethernet and a display through USB Type-C, users may experience a hung task timeout when they remove the cable between the PC and the Thunderbolt hub.
> This is because the igb_down function is called multiple times when the Thunderbolt hub is unplugged. For example, the igb_io_error_detected triggers the first call, and the igb_remove triggers the second call.
> The second call to igb_down will block at napi_synchronize.
> Here's the call trace:
> __schedule+0x3b0/0xddb
> ? __mod_timer+0x164/0x5d3
> schedule+0x44/0xa8
> schedule_timeout+0xb2/0x2a4
> ? run_local_timers+0x4e/0x4e
> msleep+0x31/0x38
> igb_down+0x12c/0x22a [igb 6615058754948bfde0bf01429257eb59f13030d4]
> __igb_close+0x6f/0x9c [igb 6615058754948bfde0bf01429257eb59f13030d4]
> igb_close+0x23/0x2b [igb 6615058754948bfde0bf01429257eb59f13030d4]
> __dev_close_many+0x95/0xec
> dev_close_many+0x6e/0x103
> unregister_netdevice_many+0x105/0x5b1
> unregister_netdevice_queue+0xc2/0x10d
> unregister_netdev+0x1c/0x23
> igb_remove+0xa7/0x11c [igb 6615058754948bfde0bf01429257eb59f13030d4]
> pci_device_remove+0x3f/0x9c
> device_release_driver_internal+0xfe/0x1b4
> pci_stop_bus_device+0x5b/0x7f
> pci_stop_bus_device+0x30/0x7f
> pci_stop_bus_device+0x30/0x7f
> pci_stop_and_remove_bus_device+0x12/0x19
> pciehp_unconfigure_device+0x76/0xe9
> pciehp_disable_slot+0x6e/0x131
> pciehp_handle_presence_or_link_change+0x7a/0x3f7
> pciehp_ist+0xbe/0x194
> irq_thread_fn+0x22/0x4d
> ? irq_thread+0x1fd/0x1fd
> irq_thread+0x17b/0x1fd
> ? irq_forced_thread_fn+0x5f/0x5f
> kthread+0x142/0x153
> ? __irq_get_irqchip_state+0x46/0x46
> ? kthread_associate_blkcg+0x71/0x71
> ret_from_fork+0x1f/0x30
>
> In this case, igb_io_error_detected detaches the network interface and requests a PCIE slot reset, however, the PCIE reset callback is not being invoked and thus the Ethernet connection breaks down.
> As the PCIE error in this case is a non-fatal one, requesting a slot reset can be avoided.
> This patch fixes the task hung issue and preserves Ethernet connection by ignoring non-fatal PCIE errors.
>
> Signed-off-by: Ying Hsu <[email protected]>
> ---
> This commit has been tested on a HP Elite Dragonfly Chromebook and a Caldigit TS3+ Thunderbolt hub. The Ethernet driver for the hub is igb. Non-fatal PCIE errors happen when users hot-plug the cables connected to the chromebook or to the external display.
>
> drivers/net/ethernet/intel/igb/igb_main.c | 5 +++++
> 1 file changed, 5 insertions(+)
>

Tested-by: Pucha Himasekhar Reddy <[email protected]> (A Contingent worker at Intel)


2024-06-06 00:05:27

by Stefan Schaeckeler

[permalink] [raw]
Subject: Re: [PATCH] igb: Fix igb_down hung on surprise removal

Hi all,

This commit introduced a regression. What does not work with this commit is an AER without a surprise removal event.


On 5/18/23 00:26, Ying Hsu wrote:
> In a setup where a Thunderbolt hub connects to Ethernet and a display
> through USB Type-C, users may experience a hung task timeout when they
> remove the cable between the PC and the Thunderbolt hub.
> This is because the igb_down function is called multiple times when
> the Thunderbolt hub is unplugged. For example, the igb_io_error_detected
> triggers the first call, and the igb_remove triggers the second call.
> The second call to igb_down will block at napi_synchronize.
> Here's the call trace:
> __schedule+0x3b0/0xddb
> ? __mod_timer+0x164/0x5d3
> schedule+0x44/0xa8
> schedule_timeout+0xb2/0x2a4
> ? run_local_timers+0x4e/0x4e
> msleep+0x31/0x38
> igb_down+0x12c/0x22a [igb 6615058754948bfde0bf01429257eb59f13030d4]
> __igb_close+0x6f/0x9c [igb 6615058754948bfde0bf01429257eb59f13030d4]
> igb_close+0x23/0x2b [igb 6615058754948bfde0bf01429257eb59f13030d4]
> __dev_close_many+0x95/0xec
> dev_close_many+0x6e/0x103
> unregister_netdevice_many+0x105/0x5b1
> unregister_netdevice_queue+0xc2/0x10d
> unregister_netdev+0x1c/0x23
> igb_remove+0xa7/0x11c [igb 6615058754948bfde0bf01429257eb59f13030d4]
> pci_device_remove+0x3f/0x9c
> device_release_driver_internal+0xfe/0x1b4
> pci_stop_bus_device+0x5b/0x7f
> pci_stop_bus_device+0x30/0x7f
> pci_stop_bus_device+0x30/0x7f
> pci_stop_and_remove_bus_device+0x12/0x19
> pciehp_unconfigure_device+0x76/0xe9
> pciehp_disable_slot+0x6e/0x131
> pciehp_handle_presence_or_link_change+0x7a/0x3f7
> pciehp_ist+0xbe/0x194
> irq_thread_fn+0x22/0x4d
> ? irq_thread+0x1fd/0x1fd
> irq_thread+0x17b/0x1fd
> ? irq_forced_thread_fn+0x5f/0x5f
> kthread+0x142/0x153
> ? __irq_get_irqchip_state+0x46/0x46
> ? kthread_associate_blkcg+0x71/0x71
> ret_from_fork+0x1f/0x30
>
> In this case, igb_io_error_detected detaches the network interface
> and requests a PCIE slot reset, however, the PCIE reset callback is
> not being invoked and thus the Ethernet connection breaks down.
> As the PCIE error in this case is a non-fatal one, requesting a
> slot reset can be avoided.
> This patch fixes the task hung issue and preserves Ethernet
> connection by ignoring non-fatal PCIE errors.
>
> Signed-off-by: Ying Hsu <[email protected]>
> ---
> This commit has been tested on a HP Elite Dragonfly Chromebook and
> a Caldigit TS3+ Thunderbolt hub. The Ethernet driver for the hub
> is igb. Non-fatal PCIE errors happen when users hot-plug the cables
> connected to the chromebook or to the external display.
>
> drivers/net/ethernet/intel/igb/igb_main.c | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
> index 58872a4c2540..a8b217368ca1 100644
> --- a/drivers/net/ethernet/intel/igb/igb_main.c
> +++ b/drivers/net/ethernet/intel/igb/igb_main.c
> @@ -9581,6 +9581,11 @@ static pci_ers_result_t igb_io_error_detected(struct pci_dev *pdev,
> struct net_device *netdev = pci_get_drvdata(pdev);
> struct igb_adapter *adapter = netdev_priv(netdev);
>
> + if (state == pci_channel_io_normal) {
> + dev_warn(&pdev->dev, "Non-correctable non-fatal error reported.\n");
> + return PCI_ERS_RESULT_CAN_RECOVER;
> + }
> +
> netif_device_detach(netdev);
>
> if (state == pci_channel_io_perm_failure)

We are currently stuck with the 5.4 kernel - our embedded system can't easily boot arbitrary kernel versions. The igb driver code has not changed and I'm quite positive the issue still persist in the latest upstream kernel.

This issue is reproducible with aer_inject. 09:00.0 is our i210 which is directly connected to the cpu root port 00:01.1:


- - - snip - - -
[node0_RP0_CPU0:~]$cat aer.log
AER
PCI_ID 09:00.0
UNCOR_STATUS COMP_TIME
HEADER_LOG 0 1 2 3
- - - snip - - -

- - - snip - - -
[node0_RP0_CPU0:~]$aer-inject aer.log
[369145.900845] pcieport 0000:00:01.1: aer_inject: Injecting errors 00000000/00004000 into device 0000:09:00.0
[369145.912726] pcieport 0000:00:01.1: AER: Uncorrected (Non-Fatal) error received: 0000:09:00.0
[369145.923124] igb 0000:09:00.0: AER: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Requester ID)
[369145.936791] igb 0000:09:00.0: AER: device [8086:1537] error status/mask=00004000/00000000
[369145.947068] igb 0000:09:00.0: AER: [14] CmpltTO
[369145.954602] igb 0000:09:00.0: Non-correctable non-fatal error reported.
[369145.984564] ------------[ cut here ]------------
[369145.990285] kernel BUG at include/linux/netdevice.h:529!
[369145.996860] invalid opcode: 0000 [#1] SMP PTI
[369146.002267] CPU: 3 PID: 142 Comm: irq/26-aerdrv Kdump: loaded Tainted: G O 5.4.251-yocto-standard #1
[369146.015073] Hardware name: Cisco System Inc. SF-D8/Type2 - Board Product Name1, BIOS 1-29-g46d9e72a-s 05/03/2019
[369146.027570] RIP: 0010:igb_up+0x51/0x160 [igb]
[369146.032974] Code: d2 eb 16 f0 80 60 60 fe f0 80 60 60 f7 48 83 c2 01 39 93 14 02 00 00 76 13 48 8b 84 d3 08 0f 00 00 48 8b 48 60 83 e1 01 75 d9 <0f> 0b f6 83 11 02 00 00 20 0f 85 c0 00 00 00 48 8b bb 08 0f 00 00
[369146.055938] RSP: 0018:ffffb29b0045bcf0 EFLAGS: 00010246
[369146.062399] RAX: ffff8d938d99e400 RBX: ffff8d9398a08740 RCX: 0000000000000000
[369146.071186] RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffff8d938d99e2c0
[369146.079974] RBP: ffffb29b0045bd00 R08: 00000000000002ec R09: ffffb29b002fa000
[369146.088761] R10: 0000000000006f80 R11: 000000000090ef2c R12: ffff8d9398a08000
[369146.097547] R13: ffff8d9398a08740 R14: ffff8d939ae92c00 R15: ffff8d939ae92c28
[369146.106335] FS: 0000000000000000(0000) GS:ffff8d939fcc0000(0000) knlGS:0000000000000000
[369146.116286] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[369146.123381] CR2: 000056318eb418e8 CR3: 000000084e744002 CR4: 00000000003606e0
[369146.132170] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[369146.140960] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[369146.149747] Call Trace:
[369146.152826] ? show_regs.cold+0x1a/0x1f
[369146.157592] ? __die+0x90/0xd9
[369146.161408] ? die+0x30/0x50
[369146.165002] ? do_trap+0x85/0xf0
[369146.169029] ? do_error_trap+0x7c/0xb0
[369146.173695] ? igb_up+0x51/0x160 [igb]
[369146.178355] ? do_invalid_op+0x3c/0x50
[369146.183019] ? igb_up+0x51/0x160 [igb]
[369146.187681] ? invalid_op+0x7c/0x90
[369146.192027] ? igb_up+0x51/0x160 [igb]
[369146.196691] ? hrtimer_try_to_cancel+0x2c/0x110
[369146.202305] ? schedule_hrtimeout_range_clock+0xa0/0x110
[369146.208871] ? hrtimer_init_sleeper+0x90/0x90
[369146.214277] ? igb_rx_fifo_flush_82575+0x32/0x270 [igb]
[369146.220738] ? igb_configure+0x417/0x650 [igb]
[369146.226247] ? igb_up+0x51/0x160 [igb]
[369146.230913] igb_io_resume+0x31/0x50 [igb]
[369146.235998] report_resume+0x5c/0x80
[369146.240449] ? pcie_portdrv_probe+0x70/0x70
[369146.245631] pci_walk_bus+0x75/0x90
[369146.249966] pcie_do_recovery+0x163/0x280
[369146.254947] aer_process_err_devices+0xa2/0xd1
[369146.260455] aer_isr.cold+0x52/0xa1
[369146.264799] ? __schedule+0x2bf/0x680
[369146.269349] ? irq_finalize_oneshot+0xf0/0xf0
[369146.274754] irq_thread_fn+0x28/0x50
[369146.279205] irq_thread+0xf8/0x180
[369146.283445] ? wake_threads_waitq+0x30/0x30
[369146.288638] kthread+0x104/0x140
[369146.292666] ? irq_thread_check_affinity+0x80/0x80
[369146.298597] ? __kthread_cancel_work+0x40/0x40
[369146.304106] ret_from_fork+0x35/0x40
- - - snip - - -

BUG_ON() comes from

- - - snip - - -
static inline void napi_enable(struct napi_struct *n)
{
BUG_ON(!test_bit(NAPI_STATE_SCHED, &n->state));
smp_mb__before_atomic();
clear_bit(NAPI_STATE_SCHED, &n->state);
clear_bit(NAPI_STATE_NPSVC, &n->state);
}
- - - snip - - -

The stack-trace shows that the AER handler starts here and goes into igb_up():

- - - snip - - -
static void igb_io_resume(struct pci_dev *pdev)
{
struct net_device *netdev = pci_get_drvdata(pdev);
struct igb_adapter *adapter = netdev_priv(netdev);

if (netif_running(netdev)) {
if (igb_up(adapter)) {
dev_err(&pdev->dev, "igb_up failed after reset\n");
return;
}
}

...
}
- - - snip - - -
Three functions come into the picture:

igb_io_error_detected() { // runs upon AER detection
igb_down(); // before this commit
noop; // with this commit
}

igb_io_resume() { // runs upon AER handling
igb_up()
}

igb_remove() { // runs upon rmmod igb, or surprise down removal (as shown in the commit log)
igb_down();
}

Before this commit, the flow for an AER on 09:00.0 was
igb_down; // from igb_io_error_detect() - this sets the NAPI_STATE_SCHED bit
igb_up; // from igb_io_resume() - that works as it finds a set NAPI_STATE_SCHED bit (and then clears it)

Now with this commit, the flow for an AER on 09:00.0 is
noop; // from igb_io_error_detect() - the NAPI_STATE_SCHED bit is not touched, e.g. it remains cleared
igb_up; // from igb_io_resume() - BUG_ON() is triggered as it finds a cleared NAPI_STATE_SCHED bit


I don't have a means to reproduce this i210 on thunderbolt issue and don't completely understand its flow. Before this commit, the "expected" flow for i210 on thunderbolt was probably

igb_down; // from igb_io_error_detect() - this sets the NAPI_STATE_SCHED bit
igb_up; // from igb_io_resume() - that works as it finds a set NAPI_STATE_SCHED bit and cleans it.
igb_down; // from igb_remove() - that works b/c of the previous igb_up()

The bug report shows that this was not the case. I don't completely understand the reported problem: what happens to and in igb_io_resume() and its call to igb_up()? In the commit log backtrace, I see igb_remove() eventually calling __dev_close_many() which clears the __LINK__STATE_START bit. igb_io_resume() does check this bit via netif_running() and calls only then igb_up(). My guess is igb_io_resume() executes after igb_remove() and skips therefore the execution of igb_up(). As we see from the commit log backtrace, that makes igb_remove() starve now in igb_down()->napi_synchronize().

How to fix that, I don't know.

Stefan



2024-06-06 08:11:41

by Ying Hsu

[permalink] [raw]
Subject: Re: [PATCH] igb: Fix igb_down hung on surprise removal

On the CalDigit Thunderbolt Station 3 Plus, we've encountered an issue
when the USB downstream display connection state changes. The
problematic sequence observed is:
```
igb_io_error_detected
igb_down
igb_io_error_detected
igb_down
```

The second igb_down call blocks at napi_synchronize.
Simply avoiding redundant igb_down calls makes the Ethernet of the
thunderbolt dock unusable.

If Intel can identify when an Ethernet device is within a Thunderbolt
tunnel, the patch can be more specific.

On Thu, Jun 6, 2024 at 8:04 AM Stefan Schaeckeler <[email protected]> wrote:
>
> Hi all,
>
> This commit introduced a regression. What does not work with this commit is an AER without a surprise removal event.
>
>
> On 5/18/23 00:26, Ying Hsu wrote:
> > In a setup where a Thunderbolt hub connects to Ethernet and a display
> > through USB Type-C, users may experience a hung task timeout when they
> > remove the cable between the PC and the Thunderbolt hub.
> > This is because the igb_down function is called multiple times when
> > the Thunderbolt hub is unplugged. For example, the igb_io_error_detected
> > triggers the first call, and the igb_remove triggers the second call.
> > The second call to igb_down will block at napi_synchronize.
> > Here's the call trace:
> > __schedule+0x3b0/0xddb
> > ? __mod_timer+0x164/0x5d3
> > schedule+0x44/0xa8
> > schedule_timeout+0xb2/0x2a4
> > ? run_local_timers+0x4e/0x4e
> > msleep+0x31/0x38
> > igb_down+0x12c/0x22a [igb 6615058754948bfde0bf01429257eb59f13030d4]
> > __igb_close+0x6f/0x9c [igb 6615058754948bfde0bf01429257eb59f13030d4]
> > igb_close+0x23/0x2b [igb 6615058754948bfde0bf01429257eb59f13030d4]
> > __dev_close_many+0x95/0xec
> > dev_close_many+0x6e/0x103
> > unregister_netdevice_many+0x105/0x5b1
> > unregister_netdevice_queue+0xc2/0x10d
> > unregister_netdev+0x1c/0x23
> > igb_remove+0xa7/0x11c [igb 6615058754948bfde0bf01429257eb59f13030d4]
> > pci_device_remove+0x3f/0x9c
> > device_release_driver_internal+0xfe/0x1b4
> > pci_stop_bus_device+0x5b/0x7f
> > pci_stop_bus_device+0x30/0x7f
> > pci_stop_bus_device+0x30/0x7f
> > pci_stop_and_remove_bus_device+0x12/0x19
> > pciehp_unconfigure_device+0x76/0xe9
> > pciehp_disable_slot+0x6e/0x131
> > pciehp_handle_presence_or_link_change+0x7a/0x3f7
> > pciehp_ist+0xbe/0x194
> > irq_thread_fn+0x22/0x4d
> > ? irq_thread+0x1fd/0x1fd
> > irq_thread+0x17b/0x1fd
> > ? irq_forced_thread_fn+0x5f/0x5f
> > kthread+0x142/0x153
> > ? __irq_get_irqchip_state+0x46/0x46
> > ? kthread_associate_blkcg+0x71/0x71
> > ret_from_fork+0x1f/0x30
> >
> > In this case, igb_io_error_detected detaches the network interface
> > and requests a PCIE slot reset, however, the PCIE reset callback is
> > not being invoked and thus the Ethernet connection breaks down.
> > As the PCIE error in this case is a non-fatal one, requesting a
> > slot reset can be avoided.
> > This patch fixes the task hung issue and preserves Ethernet
> > connection by ignoring non-fatal PCIE errors.
> >
> > Signed-off-by: Ying Hsu <[email protected]>
> > ---
> > This commit has been tested on a HP Elite Dragonfly Chromebook and
> > a Caldigit TS3+ Thunderbolt hub. The Ethernet driver for the hub
> > is igb. Non-fatal PCIE errors happen when users hot-plug the cables
> > connected to the chromebook or to the external display.
> >
> > drivers/net/ethernet/intel/igb/igb_main.c | 5 +++++
> > 1 file changed, 5 insertions(+)
> >
> > diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
> > index 58872a4c2540..a8b217368ca1 100644
> > --- a/drivers/net/ethernet/intel/igb/igb_main.c
> > +++ b/drivers/net/ethernet/intel/igb/igb_main.c
> > @@ -9581,6 +9581,11 @@ static pci_ers_result_t igb_io_error_detected(struct pci_dev *pdev,
> > struct net_device *netdev = pci_get_drvdata(pdev);
> > struct igb_adapter *adapter = netdev_priv(netdev);
> >
> > + if (state == pci_channel_io_normal) {
> > + dev_warn(&pdev->dev, "Non-correctable non-fatal error reported.\n");
> > + return PCI_ERS_RESULT_CAN_RECOVER;
> > + }
> > +
> > netif_device_detach(netdev);
> >
> > if (state == pci_channel_io_perm_failure)
>
> We are currently stuck with the 5.4 kernel - our embedded system can't easily boot arbitrary kernel versions. The igb driver code has not changed and I'm quite positive the issue still persist in the latest upstream kernel.
>
> This issue is reproducible with aer_inject. 09:00.0 is our i210 which is directly connected to the cpu root port 00:01.1:
>
>
> - - - snip - - -
> [node0_RP0_CPU0:~]$cat aer.log
> AER
> PCI_ID 09:00.0
> UNCOR_STATUS COMP_TIME
> HEADER_LOG 0 1 2 3
> - - - snip - - -
>
> - - - snip - - -
> [node0_RP0_CPU0:~]$aer-inject aer.log
> [369145.900845] pcieport 0000:00:01.1: aer_inject: Injecting errors 00000000/00004000 into device 0000:09:00.0
> [369145.912726] pcieport 0000:00:01.1: AER: Uncorrected (Non-Fatal) error received: 0000:09:00.0
> [369145.923124] igb 0000:09:00.0: AER: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Requester ID)
> [369145.936791] igb 0000:09:00.0: AER: device [8086:1537] error status/mask=00004000/00000000
> [369145.947068] igb 0000:09:00.0: AER: [14] CmpltTO
> [369145.954602] igb 0000:09:00.0: Non-correctable non-fatal error reported.
> [369145.984564] ------------[ cut here ]------------
> [369145.990285] kernel BUG at include/linux/netdevice.h:529!
> [369145.996860] invalid opcode: 0000 [#1] SMP PTI
> [369146.002267] CPU: 3 PID: 142 Comm: irq/26-aerdrv Kdump: loaded Tainted: G O 5.4.251-yocto-standard #1
> [369146.015073] Hardware name: Cisco System Inc. SF-D8/Type2 - Board Product Name1, BIOS 1-29-g46d9e72a-s 05/03/2019
> [369146.027570] RIP: 0010:igb_up+0x51/0x160 [igb]
> [369146.032974] Code: d2 eb 16 f0 80 60 60 fe f0 80 60 60 f7 48 83 c2 01 39 93 14 02 00 00 76 13 48 8b 84 d3 08 0f 00 00 48 8b 48 60 83 e1 01 75 d9 <0f> 0b f6 83 11 02 00 00 20 0f 85 c0 00 00 00 48 8b bb 08 0f 00 00
> [369146.055938] RSP: 0018:ffffb29b0045bcf0 EFLAGS: 00010246
> [369146.062399] RAX: ffff8d938d99e400 RBX: ffff8d9398a08740 RCX: 0000000000000000
> [369146.071186] RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffff8d938d99e2c0
> [369146.079974] RBP: ffffb29b0045bd00 R08: 00000000000002ec R09: ffffb29b002fa000
> [369146.088761] R10: 0000000000006f80 R11: 000000000090ef2c R12: ffff8d9398a08000
> [369146.097547] R13: ffff8d9398a08740 R14: ffff8d939ae92c00 R15: ffff8d939ae92c28
> [369146.106335] FS: 0000000000000000(0000) GS:ffff8d939fcc0000(0000) knlGS:0000000000000000
> [369146.116286] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [369146.123381] CR2: 000056318eb418e8 CR3: 000000084e744002 CR4: 00000000003606e0
> [369146.132170] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [369146.140960] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [369146.149747] Call Trace:
> [369146.152826] ? show_regs.cold+0x1a/0x1f
> [369146.157592] ? __die+0x90/0xd9
> [369146.161408] ? die+0x30/0x50
> [369146.165002] ? do_trap+0x85/0xf0
> [369146.169029] ? do_error_trap+0x7c/0xb0
> [369146.173695] ? igb_up+0x51/0x160 [igb]
> [369146.178355] ? do_invalid_op+0x3c/0x50
> [369146.183019] ? igb_up+0x51/0x160 [igb]
> [369146.187681] ? invalid_op+0x7c/0x90
> [369146.192027] ? igb_up+0x51/0x160 [igb]
> [369146.196691] ? hrtimer_try_to_cancel+0x2c/0x110
> [369146.202305] ? schedule_hrtimeout_range_clock+0xa0/0x110
> [369146.208871] ? hrtimer_init_sleeper+0x90/0x90
> [369146.214277] ? igb_rx_fifo_flush_82575+0x32/0x270 [igb]
> [369146.220738] ? igb_configure+0x417/0x650 [igb]
> [369146.226247] ? igb_up+0x51/0x160 [igb]
> [369146.230913] igb_io_resume+0x31/0x50 [igb]
> [369146.235998] report_resume+0x5c/0x80
> [369146.240449] ? pcie_portdrv_probe+0x70/0x70
> [369146.245631] pci_walk_bus+0x75/0x90
> [369146.249966] pcie_do_recovery+0x163/0x280
> [369146.254947] aer_process_err_devices+0xa2/0xd1
> [369146.260455] aer_isr.cold+0x52/0xa1
> [369146.264799] ? __schedule+0x2bf/0x680
> [369146.269349] ? irq_finalize_oneshot+0xf0/0xf0
> [369146.274754] irq_thread_fn+0x28/0x50
> [369146.279205] irq_thread+0xf8/0x180
> [369146.283445] ? wake_threads_waitq+0x30/0x30
> [369146.288638] kthread+0x104/0x140
> [369146.292666] ? irq_thread_check_affinity+0x80/0x80
> [369146.298597] ? __kthread_cancel_work+0x40/0x40
> [369146.304106] ret_from_fork+0x35/0x40
> - - - snip - - -
>
> BUG_ON() comes from
>
> - - - snip - - -
> static inline void napi_enable(struct napi_struct *n)
> {
> BUG_ON(!test_bit(NAPI_STATE_SCHED, &n->state));
> smp_mb__before_atomic();
> clear_bit(NAPI_STATE_SCHED, &n->state);
> clear_bit(NAPI_STATE_NPSVC, &n->state);
> }
> - - - snip - - -
>
> The stack-trace shows that the AER handler starts here and goes into igb_up():
>
> - - - snip - - -
> static void igb_io_resume(struct pci_dev *pdev)
> {
> struct net_device *netdev = pci_get_drvdata(pdev);
> struct igb_adapter *adapter = netdev_priv(netdev);
>
> if (netif_running(netdev)) {
> if (igb_up(adapter)) {
> dev_err(&pdev->dev, "igb_up failed after reset\n");
> return;
> }
> }
>
> ...
> }
> - - - snip - - -
> Three functions come into the picture:
>
> igb_io_error_detected() { // runs upon AER detection
> igb_down(); // before this commit
> noop; // with this commit
> }
>
> igb_io_resume() { // runs upon AER handling
> igb_up()
> }
>
> igb_remove() { // runs upon rmmod igb, or surprise down removal (as shown in the commit log)
> igb_down();
> }
>
> Before this commit, the flow for an AER on 09:00.0 was
> igb_down; // from igb_io_error_detect() - this sets the NAPI_STATE_SCHED bit
> igb_up; // from igb_io_resume() - that works as it finds a set NAPI_STATE_SCHED bit (and then clears it)
>
> Now with this commit, the flow for an AER on 09:00.0 is
> noop; // from igb_io_error_detect() - the NAPI_STATE_SCHED bit is not touched, e.g. it remains cleared
> igb_up; // from igb_io_resume() - BUG_ON() is triggered as it finds a cleared NAPI_STATE_SCHED bit
>
>
> I don't have a means to reproduce this i210 on thunderbolt issue and don't completely understand its flow. Before this commit, the "expected" flow for i210 on thunderbolt was probably
>
> igb_down; // from igb_io_error_detect() - this sets the NAPI_STATE_SCHED bit
> igb_up; // from igb_io_resume() - that works as it finds a set NAPI_STATE_SCHED bit and cleans it.
> igb_down; // from igb_remove() - that works b/c of the previous igb_up()
>
> The bug report shows that this was not the case. I don't completely understand the reported problem: what happens to and in igb_io_resume() and its call to igb_up()? In the commit log backtrace, I see igb_remove() eventually calling __dev_close_many() which clears the __LINK__STATE_START bit. igb_io_resume() does check this bit via netif_running() and calls only then igb_up(). My guess is igb_io_resume() executes after igb_remove() and skips therefore the execution of igb_up(). As we see from the commit log backtrace, that makes igb_remove() starve now in igb_down()->napi_synchronize().
>
> How to fix that, I don't know.
>
> Stefan
>
>

2024-06-06 15:48:32

by Stefan Schaeckeler

[permalink] [raw]
Subject: Re: [PATCH] igb: Fix igb_down hung on surprise removal

Hello Ying,

On 6/6/24 01:03, Ying Hsu wrote:
> On the CalDigit Thunderbolt Station 3 Plus, we've encountered an issue
> when the USB downstream display connection state changes. The
> problematic sequence observed is:
> ```
> igb_io_error_detected
> igb_down
> igb_io_error_detected
> igb_down
> ```
>
> The second igb_down call blocks at napi_synchronize.

From the backtrace in your commit message, I gain the impression you get a hotplug event for removing the ethernet device. From your commit message I gain the impression you get an AER as well which is handled in igb_io_error_detected()/igb_io_resume(). The problem lies IMHO in the interaction of both.


> Simply avoiding redundant igb_down calls makes the Ethernet of the thunderbolt dock unusable.

I'm not too sure if the current code is even perfect in your use-case. What happens when you get an AER on the ethernet device (without plugging it out at the same time)?

Can you try to AER inject a completion timeout into your ethernet device, similar how I showed it in my previous message? Just replace the bdf 09:00.0 with the bdf of your ethernet device. I expect a kernel crash like we see that on our embedded system.


> If Intel can identify when an Ethernet device is within a Thunderbolt
> tunnel, the patch can be more specific.

Stefan