2023-02-22 16:28:38

by Jochen Henneberg

[permalink] [raw]
Subject: Issue: stmmac reset by netdev watchdog due to tx queue timeout

I have been debugging an issue with the stmmac network driver and the
Intel Elkhart Lake SoC for quite some time now. The problem comes up
when a port forwarding is configured with iptables NAT like this:

iptables -t nat -A PREROUTING -p tcp --dport 222 \
-j DNAT --to-destination 192.168.178.134:22
iptables -t nat -A POSTROUTING -p tcp --dst 192.168.178.134 \
--dport 22 j SNAT --to-source 192.168.178.138

If I 'ssh -p 222 192.168.178.138' the sttmac is reset after some seconds
with:

[ 553.050018] NETDEV WATCHDOG: eno1 (intel-eth-pci): transmit queue 0 timed out
[ 553.050048] WARNING: CPU: 2 PID: 0 at net/sched/sch_generic.c:525 dev_watchdog+0x23a/0x250
[ 553.050059] Modules linked in: nft_chain_nat xt_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_tcpudp nft_compat nf_tables libcrc32c nfnetlink snd_seq_dummy snd_hrtimer snd_hda_codec_hdmi snd_hda_codec_idt snd_hda_codec_generic ledtrig_audio snd_sof_pci_intel_tgl snd_sof_intel_hda_common soundwire_intel soundwire_generic_allocation soundwire_cadence snd_sof_intel_hda snd_sof_pci snd_sof_xtensa_dsp snd_sof snd_sof_utils snd_soc_hdac_hda snd_hda_ext_core snd_soc_acpi_intel_match binfmt_misc snd_soc_acpi soundwire_bus snd_soc_core snd_compress ac97_bus snd_pcm_dmaengine snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec intel_rapl_msr snd_hda_core intel_rapl_common snd_hwdep x86_pkg_temp_thermal intel_powerclamp snd_pcm coretemp nls_iso8859_1 snd_seq_midi kvm_intel snd_seq_midi_event snd_rawmidi mei_hdcp i915 kvm mei_pxp snd_seq crct10dif_pclmul ghash_clmulni_intel snd_seq_device sha512_ssse3 drm_buddy ttm cmdlinepart aesni_intel snd_timer spi_nor crypto_simd
[ 553.050187] drm_display_helper mtd cryptd snd intel_cstate cec rc_core joydev intel_wmi_thunderbolt drm_kms_helper soundcore i2c_algo_bit mei_me syscopyarea input_leds sysfillrect mei 8250_dw sysimgblt igen6_edac mac_hid intel_hid acpi_pad acpi_tad sparse_keymap msr parport_pc ppdev lp parport drm pstore_blk ramoops pstore_zone reed_solomon efi_pstore ip_tables x_tables autofs4 hid_logitech_hidpp hid_logitech_dj hid_generic usbhid uas hid usb_storage mxl_gpy polynomial mmc_block dwmac_intel spi_intel_pci i2c_i801 intel_ish_ipc gpio_kempld i2c_kempld crc32_pclmul spi_intel intel_ishtp i2c_smbus stmmac intel_lpss_pci ahci sdhci_pci xhci_pci intel_lpss xhci_pci_renesas pcs_xpcs cqhci libahci video phylink sdhci idma64 kempld_core i2c_scmi wmi pinctrl_elkhartlake(+)
[ 553.050354] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G D 6.2.0-rc8+ #22
[ 553.050360] Hardware name: Default string Default string/COMe-mEL10 E2, BIOS MEL1R904 11/02/2022
[ 553.050363] RIP: 0010:dev_watchdog+0x23a/0x250
[ 553.050370] Code: 00 e9 2b ff ff ff 48 89 df c6 05 de 01 83 01 01 e8 fb 25 f8 ff 44 89 f1 48 89 de 48 c7 c7 a8 cd c0 a6 48 89 c2 e8 24 53 20 00 <0f> 0b e9 1c ff ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00
[ 553.050373] RSP: 0018:ffffbfc34018ce38 EFLAGS: 00010246
[ 553.050378] RAX: 0000000000000000 RBX: ffffa0d392b68000 RCX: 0000000000000000
[ 553.050381] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[ 553.050384] RBP: ffffbfc34018ce68 R08: 0000000000000000 R09: 0000000000000000
[ 553.050386] R10: 0000000000000000 R11: 0000000000000000 R12: ffffa0d392b684c8
[ 553.050389] R13: ffffa0d392b6841c R14: 0000000000000000 R15: 0000000000000000
[ 553.050392] FS: 0000000000000000(0000) GS:ffffa0d4e4300000(0000) knlGS:0000000000000000
[ 553.050395] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 553.050398] CR2: 000055da1a873000 CR3: 000000023fc10000 CR4: 0000000000350ee0
[ 553.050402] Call Trace:
[ 553.050406] <IRQ>
[ 553.050413] ? __pfx_dev_watchdog+0x10/0x10
[ 553.050419] call_timer_fn+0x29/0x160
[ 553.050425] ? __pfx_dev_watchdog+0x10/0x10
[ 553.050429] __run_timers+0x259/0x310
[ 553.050434] run_timer_softirq+0x1d/0x40
[ 553.050437] __do_softirq+0xd6/0x346
[ 553.050444] ? hrtimer_interrupt+0x11f/0x230
[ 553.050449] __irq_exit_rcu+0xa2/0xd0
[ 553.050455] irq_exit_rcu+0xe/0x20
[ 553.050459] sysvec_apic_timer_interrupt+0x92/0xd0
[ 553.050465] </IRQ>
[ 553.050467] <TASK>
[ 553.050469] asm_sysvec_apic_timer_interrupt+0x1b/0x20
[ 553.050475] RIP: 0010:cpuidle_enter_state+0xde/0x6f0
[ 553.050481] Code: 7f 1b 5a e8 14 2f 4e ff 8b 53 04 49 89 c7 0f 1f 44 00 00 31 ff e8 22 40 4d ff 80 7d d0 00 0f 85 eb 00 00 00 fb 0f 1f 44 00 00 <45> 85 f6 0f 88 12 02 00 00 4d 63 ee 49 83 fd 09 0f 87 c7 04 00 00
[ 553.050484] RSP: 0018:ffffbfc340113e38 EFLAGS: 00000246
[ 553.050489] RAX: 0000000000000000 RBX: ffffa0d4e433c930 RCX: 0000000000000000
[ 553.050492] RDX: 0000000000000002 RSI: 0000000000000000 RDI: 0000000000000000
[ 553.050494] RBP: ffffbfc340113e88 R08: 0000000000000000 R09: 0000000000000000
[ 553.050496] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffa76bb700
[ 553.050499] R13: 0000000000000003 R14: 0000000000000003 R15: 00000080c45910fe
[ 553.050504] ? cpuidle_enter_state+0xce/0x6f0
[ 553.050508] cpuidle_enter+0x2e/0x50
[ 553.050512] do_idle+0x216/0x2a0
[ 553.050517] cpu_startup_entry+0x1d/0x20
[ 553.050521] start_secondary+0x122/0x160
[ 553.050527] secondary_startup_64_no_verify+0xe5/0xeb
[ 553.050535] </TASK>
[ 553.050537] ---[ end trace 0000000000000000 ]---
[ 553.050584] intel-eth-pci 0000:00:1d.1 eno1: Reset adapter.
[ 553.064639] intel-eth-pci 0000:00:1d.1 eno1: FPE workqueue stop
[ 553.065416] intel-eth-pci 0000:00:1d.1 eno1: Register MEM_TYPE_PAGE_POOL RxQ-0
[ 553.066003] intel-eth-pci 0000:00:1d.1 eno1: Register MEM_TYPE_PAGE_POOL RxQ-1
[ 553.066469] intel-eth-pci 0000:00:1d.1 eno1: Register MEM_TYPE_PAGE_POOL RxQ-2
[ 553.066933] intel-eth-pci 0000:00:1d.1 eno1: Register MEM_TYPE_PAGE_POOL RxQ-3
[ 553.067428] intel-eth-pci 0000:00:1d.1 eno1: Register MEM_TYPE_PAGE_POOL RxQ-4
[ 553.067895] intel-eth-pci 0000:00:1d.1 eno1: Register MEM_TYPE_PAGE_POOL RxQ-5
[ 553.068466] intel-eth-pci 0000:00:1d.1 eno1: Register MEM_TYPE_PAGE_POOL RxQ-6
[ 553.069025] intel-eth-pci 0000:00:1d.1 eno1: Register MEM_TYPE_PAGE_POOL RxQ-7
[ 553.082011] dwmac4: Master AXI performs any burst length
[ 553.082094] intel-eth-pci 0000:00:1d.1 eno1: Enabling Safety Features
[ 553.082138] intel-eth-pci 0000:00:1d.1 eno1: IEEE 1588-2008 Advanced Timestamp supported
[ 553.082438] intel-eth-pci 0000:00:1d.1 eno1: registered PTP clock
[ 553.082641] intel-eth-pci 0000:00:1d.1 eno1: FPE workqueue start
[ 553.082649] intel-eth-pci 0000:00:1d.1 eno1: configuring for inband/sgmii link mode
[ 553.083700] intel-eth-pci 0000:00:1d.1 eno1: Link is Up - 1Gbps/Full - flow control off

This does not happen during normal ssh to the EHL board, this does not
happen if I do port forwarding through userspace, e. g. with socat. And
this does not happen if two independent network interfaces are used in
the iptables rules. I have not observed the issue with other network
chips with the same kernel version so I think the issue must be in the
stmmac driver.

What happens is that the OWNED bit of a DMA descriptor is not reset
(detected in dwmac4_wrback_get_tx_status()) and the queue's txtimer is
looping endlessly until the timeout resets everything. Sometimes the
situation is solved before the timeout, most of the times it is not but
i always takes long until the OWNED bit is reset. This can be reproduced
with 100% success.

I have tried to understand where the issue comes from and I think it may
have something to do with the calls to dma_wmb() and wmb() (or more
precisely with missing barriers) but so far I had no success to solve it
and provide a patch.

The kernel that I am running is a quite recent linux-net
(b60417a9f2b8). I have also tried with linux-net-next without
success. The issue can already be observed with v5.15.39.

Any help or suggestion how I can debug this further would be
appreciated. Or if somebody else can reproduce or not reproduce the
issue on the given platform may help as well.

Regards
-Jochen


2023-02-22 16:36:00

by Jochen Henneberg

[permalink] [raw]
Subject: Re: Issue: stmmac reset by netdev watchdog due to tx queue timeout

Jochen Henneberg <[email protected]> writes:

> I have been debugging an issue with the stmmac network driver and the
> Intel Elkhart Lake SoC for quite some time now. The problem comes up
> when a port forwarding is configured with iptables NAT like this:
>
> iptables -t nat -A PREROUTING -p tcp --dport 222 \
> -j DNAT --to-destination 192.168.178.134:22
> iptables -t nat -A POSTROUTING -p tcp --dst 192.168.178.134 \
> --dport 22 j SNAT --to-source 192.168.178.138
>
> If I 'ssh -p 222 192.168.178.138' the sttmac is reset after some seconds
> with:
>
> [ 553.050018] NETDEV WATCHDOG: eno1 (intel-eth-pci): transmit queue 0
> timed out
> [ 553.050048] WARNING: CPU: 2 PID: 0 at net/sched/sch_generic.c:525
> dev_watchdog+0x23a/0x250
> [ 553.050059] Modules linked in: nft_chain_nat xt_nat nf_nat
> nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_tcpudp nft_compat
> nf_tables libcrc32c nfnetlink snd_seq_dummy snd_hrtimer
> snd_hda_codec_hdmi snd_hda_codec_idt snd_hda_codec_generic
> ledtrig_audio snd_sof_pci_intel_tgl snd_sof_intel_hda_common
> soundwire_intel soundwire_generic_allocation soundwire_cadence
> snd_sof_intel_hda snd_sof_pci snd_sof_xtensa_dsp snd_sof snd_sof_utils
> snd_soc_hdac_hda snd_hda_ext_core snd_soc_acpi_intel_match binfmt_misc
> snd_soc_acpi soundwire_bus snd_soc_core snd_compress ac97_bus
> snd_pcm_dmaengine snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi
> snd_hda_codec intel_rapl_msr snd_hda_core intel_rapl_common snd_hwdep
> x86_pkg_temp_thermal intel_powerclamp snd_pcm coretemp nls_iso8859_1
> snd_seq_midi kvm_intel snd_seq_midi_event snd_rawmidi mei_hdcp i915
> kvm mei_pxp snd_seq crct10dif_pclmul ghash_clmulni_intel
> snd_seq_device sha512_ssse3 drm_buddy ttm cmdlinepart aesni_intel
> snd_timer spi_nor crypto_simd
> [ 553.050187] drm_display_helper mtd cryptd snd intel_cstate cec
> rc_core joydev intel_wmi_thunderbolt drm_kms_helper soundcore
> i2c_algo_bit mei_me syscopyarea input_leds sysfillrect mei 8250_dw
> sysimgblt igen6_edac mac_hid intel_hid acpi_pad acpi_tad sparse_keymap
> msr parport_pc ppdev lp parport drm pstore_blk ramoops pstore_zone
> reed_solomon efi_pstore ip_tables x_tables autofs4 hid_logitech_hidpp
> hid_logitech_dj hid_generic usbhid uas hid usb_storage mxl_gpy
> polynomial mmc_block dwmac_intel spi_intel_pci i2c_i801 intel_ish_ipc
> gpio_kempld i2c_kempld crc32_pclmul spi_intel intel_ishtp i2c_smbus
> stmmac intel_lpss_pci ahci sdhci_pci xhci_pci intel_lpss
> xhci_pci_renesas pcs_xpcs cqhci libahci video phylink sdhci idma64
> kempld_core i2c_scmi wmi pinctrl_elkhartlake(+)
> [ 553.050354] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G D 6.2.0-rc8+
> #22
> [ 553.050360] Hardware name: Default string Default string/COMe-mEL10
> E2, BIOS MEL1R904 11/02/2022
> [ 553.050363] RIP: 0010:dev_watchdog+0x23a/0x250
> [ 553.050370] Code: 00 e9 2b ff ff ff 48 89 df c6 05 de 01 83 01 01 e8
> fb 25 f8 ff 44 89 f1 48 89 de 48 c7 c7 a8 cd c0 a6 48 89 c2 e8 24 53
> 20 00 <0f> 0b e9 1c ff ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40
> 00
> [ 553.050373] RSP: 0018:ffffbfc34018ce38 EFLAGS: 00010246
> [ 553.050378] RAX: 0000000000000000 RBX: ffffa0d392b68000 RCX:
> 0000000000000000
> [ 553.050381] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
> 0000000000000000
> [ 553.050384] RBP: ffffbfc34018ce68 R08: 0000000000000000 R09:
> 0000000000000000
> [ 553.050386] R10: 0000000000000000 R11: 0000000000000000 R12:
> ffffa0d392b684c8
> [ 553.050389] R13: ffffa0d392b6841c R14: 0000000000000000 R15:
> 0000000000000000
> [ 553.050392] FS: 0000000000000000(0000) GS:ffffa0d4e4300000(0000)
> knlGS:0000000000000000
> [ 553.050395] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 553.050398] CR2: 000055da1a873000 CR3: 000000023fc10000 CR4:
> 0000000000350ee0
> [ 553.050402] Call Trace:
> [ 553.050406] <IRQ>
> [ 553.050413] ? __pfx_dev_watchdog+0x10/0x10
> [ 553.050419] call_timer_fn+0x29/0x160
> [ 553.050425] ? __pfx_dev_watchdog+0x10/0x10
> [ 553.050429] __run_timers+0x259/0x310
> [ 553.050434] run_timer_softirq+0x1d/0x40
> [ 553.050437] __do_softirq+0xd6/0x346
> [ 553.050444] ? hrtimer_interrupt+0x11f/0x230
> [ 553.050449] __irq_exit_rcu+0xa2/0xd0
> [ 553.050455] irq_exit_rcu+0xe/0x20
> [ 553.050459] sysvec_apic_timer_interrupt+0x92/0xd0
> [ 553.050465] </IRQ>
> [ 553.050467] <TASK>
> [ 553.050469] asm_sysvec_apic_timer_interrupt+0x1b/0x20
> [ 553.050475] RIP: 0010:cpuidle_enter_state+0xde/0x6f0
> [ 553.050481] Code: 7f 1b 5a e8 14 2f 4e ff 8b 53 04 49 89 c7 0f 1f 44
> 00 00 31 ff e8 22 40 4d ff 80 7d d0 00 0f 85 eb 00 00 00 fb 0f 1f 44
> 00 00 <45> 85 f6 0f 88 12 02 00 00 4d 63 ee 49 83 fd 09 0f 87 c7 04 00
> 00
> [ 553.050484] RSP: 0018:ffffbfc340113e38 EFLAGS: 00000246
> [ 553.050489] RAX: 0000000000000000 RBX: ffffa0d4e433c930 RCX:
> 0000000000000000
> [ 553.050492] RDX: 0000000000000002 RSI: 0000000000000000 RDI:
> 0000000000000000
> [ 553.050494] RBP: ffffbfc340113e88 R08: 0000000000000000 R09:
> 0000000000000000
> [ 553.050496] R10: 0000000000000000 R11: 0000000000000000 R12:
> ffffffffa76bb700
> [ 553.050499] R13: 0000000000000003 R14: 0000000000000003 R15:
> 00000080c45910fe
> [ 553.050504] ? cpuidle_enter_state+0xce/0x6f0
> [ 553.050508] cpuidle_enter+0x2e/0x50
> [ 553.050512] do_idle+0x216/0x2a0
> [ 553.050517] cpu_startup_entry+0x1d/0x20
> [ 553.050521] start_secondary+0x122/0x160
> [ 553.050527] secondary_startup_64_no_verify+0xe5/0xeb
> [ 553.050535] </TASK>
> [ 553.050537] ---[ end trace 0000000000000000 ]---
> [ 553.050584] intel-eth-pci 0000:00:1d.1 eno1: Reset adapter.
> [ 553.064639] intel-eth-pci 0000:00:1d.1 eno1: FPE workqueue stop
> [ 553.065416] intel-eth-pci 0000:00:1d.1 eno1: Register
> MEM_TYPE_PAGE_POOL RxQ-0
> [ 553.066003] intel-eth-pci 0000:00:1d.1 eno1: Register
> MEM_TYPE_PAGE_POOL RxQ-1
> [ 553.066469] intel-eth-pci 0000:00:1d.1 eno1: Register
> MEM_TYPE_PAGE_POOL RxQ-2
> [ 553.066933] intel-eth-pci 0000:00:1d.1 eno1: Register
> MEM_TYPE_PAGE_POOL RxQ-3
> [ 553.067428] intel-eth-pci 0000:00:1d.1 eno1: Register
> MEM_TYPE_PAGE_POOL RxQ-4
> [ 553.067895] intel-eth-pci 0000:00:1d.1 eno1: Register
> MEM_TYPE_PAGE_POOL RxQ-5
> [ 553.068466] intel-eth-pci 0000:00:1d.1 eno1: Register
> MEM_TYPE_PAGE_POOL RxQ-6
> [ 553.069025] intel-eth-pci 0000:00:1d.1 eno1: Register
> MEM_TYPE_PAGE_POOL RxQ-7
> [ 553.082011] dwmac4: Master AXI performs any burst length
> [ 553.082094] intel-eth-pci 0000:00:1d.1 eno1: Enabling Safety Features
> [ 553.082138] intel-eth-pci 0000:00:1d.1 eno1: IEEE 1588-2008 Advanced
> Timestamp supported
> [ 553.082438] intel-eth-pci 0000:00:1d.1 eno1: registered PTP clock
> [ 553.082641] intel-eth-pci 0000:00:1d.1 eno1: FPE workqueue start
> [ 553.082649] intel-eth-pci 0000:00:1d.1 eno1: configuring for
> inband/sgmii link mode
> [ 553.083700] intel-eth-pci 0000:00:1d.1 eno1: Link is Up - 1Gbps/Full
> - flow control off
>
> This does not happen during normal ssh to the EHL board, this does not
> happen if I do port forwarding through userspace, e. g. with socat. And
> this does not happen if two independent network interfaces are used in
> the iptables rules. I have not observed the issue with other network
> chips with the same kernel version so I think the issue must be in the
> stmmac driver.
>
> What happens is that the OWNED bit of a DMA descriptor is not reset
> (detected in dwmac4_wrback_get_tx_status()) and the queue's txtimer is
> looping endlessly until the timeout resets everything. Sometimes the
> situation is solved before the timeout, most of the times it is not but
> i always takes long until the OWNED bit is reset. This can be reproduced
> with 100% success.
>
> I have tried to understand where the issue comes from and I think it may
> have something to do with the calls to dma_wmb() and wmb() (or more
> precisely with missing barriers) but so far I had no success to solve it
> and provide a patch.
>
> The kernel that I am running is a quite recent linux-net
> (b60417a9f2b8). I have also tried with linux-net-next without
> success. The issue can already be observed with v5.15.39.
>
> Any help or suggestion how I can debug this further would be
> appreciated. Or if somebody else can reproduce or not reproduce the
> issue on the given platform may help as well.
>
> Regards
> -Jochen

Some more details about the devices involved:

intel-eth-pci 0000:00:1d.1: enabling device (0000 -> 0002)
intel-eth-pci 0000:00:1d.1: stmmac_config_multi_msi: multi MSI enablement successful
intel-eth-pci 0000:00:1d.1: User ID: 0x51, Synopsys ID: 0x52
intel-eth-pci 0000:00:1d.1: DWMAC4/5
intel-eth-pci 0000:00:1d.1: DMA HW capability register supported
intel-eth-pci 0000:00:1d.1: RX Checksum Offload Engine supported
intel-eth-pci 0000:00:1d.1: TX Checksum insertion supported
intel-eth-pci 0000:00:1d.1: TSO supported
intel-eth-pci 0000:00:1d.1: Enable RX Mitigation via HW Watchdog Timer
intel-eth-pci 0000:00:1d.1: device MAC address 00:e0:4b:74:93:db
intel-eth-pci 0000:00:1d.1: Enabled L3L4 Flow TC (entries=2)
intel-eth-pci 0000:00:1d.1: Enabled RFS Flow TC (entries=10)
intel-eth-pci 0000:00:1d.1: Enabling HW TC (entries=256, max_off=256)
intel-eth-pci 0000:00:1d.1: TSO feature enabled
intel-eth-pci 0000:00:1d.1: Using 32 bits DMA width
Maxlinear Ethernet GPY115B stmmac-2:01: Firmware Version: 7.110 (0x876E)
Maxlinear Ethernet GPY115B stmmac-2:01: attached PHY driver (mii_bus:phy_addr=stmmac-2:01>
intel-eth-pci 0000:00:1d.1 eno1: renamed from eth0
intel-eth-pci 0000:00:1d.1 eno1: Register MEM_TYPE_PAGE_POOL RxQ-0
intel-eth-pci 0000:00:1d.1 eno1: Register MEM_TYPE_PAGE_POOL RxQ-1
intel-eth-pci 0000:00:1d.1 eno1: Register MEM_TYPE_PAGE_POOL RxQ-2
intel-eth-pci 0000:00:1d.1 eno1: Register MEM_TYPE_PAGE_POOL RxQ-3
intel-eth-pci 0000:00:1d.1 eno1: Register MEM_TYPE_PAGE_POOL RxQ-4
intel-eth-pci 0000:00:1d.1 eno1: Register MEM_TYPE_PAGE_POOL RxQ-5
intel-eth-pci 0000:00:1d.1 eno1: Register MEM_TYPE_PAGE_POOL RxQ-6
intel-eth-pci 0000:00:1d.1 eno1: Register MEM_TYPE_PAGE_POOL RxQ-7
intel-eth-pci 0000:00:1d.1 eno1: Enabling Safety Features
intel-eth-pci 0000:00:1d.1 eno1: IEEE 1588-2008 Advanced Timestamp supported
intel-eth-pci 0000:00:1d.1 eno1: registered PTP clock
intel-eth-pci 0000:00:1d.1 eno1: FPE workqueue start
intel-eth-pci 0000:00:1d.1 eno1: configuring for inband/sgmii link mode
intel-eth-pci 0000:00:1d.1 eno1: Link is Up - 1Gbps/Full - flow control off

Regards
-Jochen

2023-03-14 12:42:34

by Jochen Henneberg

[permalink] [raw]
Subject: Re: Issue: stmmac reset by netdev watchdog due to tx queue timeout


Jochen Henneberg <[email protected]> writes:

> Jochen Henneberg <[email protected]> writes:
>
>> I have been debugging an issue with the stmmac network driver and the
>> Intel Elkhart Lake SoC for quite some time now. The problem comes up
>> when a port forwarding is configured with iptables NAT like this:
>>
>> iptables -t nat -A PREROUTING -p tcp --dport 222 \
>> -j DNAT --to-destination 192.168.178.134:22
>> iptables -t nat -A POSTROUTING -p tcp --dst 192.168.178.134 \
>> --dport 22 j SNAT --to-source 192.168.178.138
>>
>> If I 'ssh -p 222 192.168.178.138' the sttmac is reset after some seconds
>> with:
>>
>> [ 553.050018] NETDEV WATCHDOG: eno1 (intel-eth-pci): transmit queue 0
>> timed out
>> [ 553.050048] WARNING: CPU: 2 PID: 0 at net/sched/sch_generic.c:525
>> dev_watchdog+0x23a/0x250
>> [ 553.050059] Modules linked in: nft_chain_nat xt_nat nf_nat
>> nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_tcpudp nft_compat
>> nf_tables libcrc32c nfnetlink snd_seq_dummy snd_hrtimer
>> snd_hda_codec_hdmi snd_hda_codec_idt snd_hda_codec_generic
>> ledtrig_audio snd_sof_pci_intel_tgl snd_sof_intel_hda_common
>> soundwire_intel soundwire_generic_allocation soundwire_cadence
>> snd_sof_intel_hda snd_sof_pci snd_sof_xtensa_dsp snd_sof snd_sof_utils
>> snd_soc_hdac_hda snd_hda_ext_core snd_soc_acpi_intel_match binfmt_misc
>> snd_soc_acpi soundwire_bus snd_soc_core snd_compress ac97_bus
>> snd_pcm_dmaengine snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi
>> snd_hda_codec intel_rapl_msr snd_hda_core intel_rapl_common snd_hwdep
>> x86_pkg_temp_thermal intel_powerclamp snd_pcm coretemp nls_iso8859_1
>> snd_seq_midi kvm_intel snd_seq_midi_event snd_rawmidi mei_hdcp i915
>> kvm mei_pxp snd_seq crct10dif_pclmul ghash_clmulni_intel
>> snd_seq_device sha512_ssse3 drm_buddy ttm cmdlinepart aesni_intel
>> snd_timer spi_nor crypto_simd
>> [ 553.050187] drm_display_helper mtd cryptd snd intel_cstate cec
>> rc_core joydev intel_wmi_thunderbolt drm_kms_helper soundcore
>> i2c_algo_bit mei_me syscopyarea input_leds sysfillrect mei 8250_dw
>> sysimgblt igen6_edac mac_hid intel_hid acpi_pad acpi_tad sparse_keymap
>> msr parport_pc ppdev lp parport drm pstore_blk ramoops pstore_zone
>> reed_solomon efi_pstore ip_tables x_tables autofs4 hid_logitech_hidpp
>> hid_logitech_dj hid_generic usbhid uas hid usb_storage mxl_gpy
>> polynomial mmc_block dwmac_intel spi_intel_pci i2c_i801 intel_ish_ipc
>> gpio_kempld i2c_kempld crc32_pclmul spi_intel intel_ishtp i2c_smbus
>> stmmac intel_lpss_pci ahci sdhci_pci xhci_pci intel_lpss
>> xhci_pci_renesas pcs_xpcs cqhci libahci video phylink sdhci idma64
>> kempld_core i2c_scmi wmi pinctrl_elkhartlake(+)
>> [ 553.050354] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G D 6.2.0-rc8+
>> #22
>> [ 553.050360] Hardware name: Default string Default string/COMe-mEL10
>> E2, BIOS MEL1R904 11/02/2022
>> [ 553.050363] RIP: 0010:dev_watchdog+0x23a/0x250
>> [ 553.050370] Code: 00 e9 2b ff ff ff 48 89 df c6 05 de 01 83 01 01 e8
>> fb 25 f8 ff 44 89 f1 48 89 de 48 c7 c7 a8 cd c0 a6 48 89 c2 e8 24 53
>> 20 00 <0f> 0b e9 1c ff ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40
>> 00
>> [ 553.050373] RSP: 0018:ffffbfc34018ce38 EFLAGS: 00010246
>> [ 553.050378] RAX: 0000000000000000 RBX: ffffa0d392b68000 RCX:
>> 0000000000000000
>> [ 553.050381] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
>> 0000000000000000
>> [ 553.050384] RBP: ffffbfc34018ce68 R08: 0000000000000000 R09:
>> 0000000000000000
>> [ 553.050386] R10: 0000000000000000 R11: 0000000000000000 R12:
>> ffffa0d392b684c8
>> [ 553.050389] R13: ffffa0d392b6841c R14: 0000000000000000 R15:
>> 0000000000000000
>> [ 553.050392] FS: 0000000000000000(0000) GS:ffffa0d4e4300000(0000)
>> knlGS:0000000000000000
>> [ 553.050395] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [ 553.050398] CR2: 000055da1a873000 CR3: 000000023fc10000 CR4:
>> 0000000000350ee0
>> [ 553.050402] Call Trace:
>> [ 553.050406] <IRQ>
>> [ 553.050413] ? __pfx_dev_watchdog+0x10/0x10
>> [ 553.050419] call_timer_fn+0x29/0x160
>> [ 553.050425] ? __pfx_dev_watchdog+0x10/0x10
>> [ 553.050429] __run_timers+0x259/0x310
>> [ 553.050434] run_timer_softirq+0x1d/0x40
>> [ 553.050437] __do_softirq+0xd6/0x346
>> [ 553.050444] ? hrtimer_interrupt+0x11f/0x230
>> [ 553.050449] __irq_exit_rcu+0xa2/0xd0
>> [ 553.050455] irq_exit_rcu+0xe/0x20
>> [ 553.050459] sysvec_apic_timer_interrupt+0x92/0xd0
>> [ 553.050465] </IRQ>
>> [ 553.050467] <TASK>
>> [ 553.050469] asm_sysvec_apic_timer_interrupt+0x1b/0x20
>> [ 553.050475] RIP: 0010:cpuidle_enter_state+0xde/0x6f0
>> [ 553.050481] Code: 7f 1b 5a e8 14 2f 4e ff 8b 53 04 49 89 c7 0f 1f 44
>> 00 00 31 ff e8 22 40 4d ff 80 7d d0 00 0f 85 eb 00 00 00 fb 0f 1f 44
>> 00 00 <45> 85 f6 0f 88 12 02 00 00 4d 63 ee 49 83 fd 09 0f 87 c7 04 00
>> 00
>> [ 553.050484] RSP: 0018:ffffbfc340113e38 EFLAGS: 00000246
>> [ 553.050489] RAX: 0000000000000000 RBX: ffffa0d4e433c930 RCX:
>> 0000000000000000
>> [ 553.050492] RDX: 0000000000000002 RSI: 0000000000000000 RDI:
>> 0000000000000000
>> [ 553.050494] RBP: ffffbfc340113e88 R08: 0000000000000000 R09:
>> 0000000000000000
>> [ 553.050496] R10: 0000000000000000 R11: 0000000000000000 R12:
>> ffffffffa76bb700
>> [ 553.050499] R13: 0000000000000003 R14: 0000000000000003 R15:
>> 00000080c45910fe
>> [ 553.050504] ? cpuidle_enter_state+0xce/0x6f0
>> [ 553.050508] cpuidle_enter+0x2e/0x50
>> [ 553.050512] do_idle+0x216/0x2a0
>> [ 553.050517] cpu_startup_entry+0x1d/0x20
>> [ 553.050521] start_secondary+0x122/0x160
>> [ 553.050527] secondary_startup_64_no_verify+0xe5/0xeb
>> [ 553.050535] </TASK>
>> [ 553.050537] ---[ end trace 0000000000000000 ]---
>> [ 553.050584] intel-eth-pci 0000:00:1d.1 eno1: Reset adapter.
>> [ 553.064639] intel-eth-pci 0000:00:1d.1 eno1: FPE workqueue stop
>> [ 553.065416] intel-eth-pci 0000:00:1d.1 eno1: Register
>> MEM_TYPE_PAGE_POOL RxQ-0
>> [ 553.066003] intel-eth-pci 0000:00:1d.1 eno1: Register
>> MEM_TYPE_PAGE_POOL RxQ-1
>> [ 553.066469] intel-eth-pci 0000:00:1d.1 eno1: Register
>> MEM_TYPE_PAGE_POOL RxQ-2
>> [ 553.066933] intel-eth-pci 0000:00:1d.1 eno1: Register
>> MEM_TYPE_PAGE_POOL RxQ-3
>> [ 553.067428] intel-eth-pci 0000:00:1d.1 eno1: Register
>> MEM_TYPE_PAGE_POOL RxQ-4
>> [ 553.067895] intel-eth-pci 0000:00:1d.1 eno1: Register
>> MEM_TYPE_PAGE_POOL RxQ-5
>> [ 553.068466] intel-eth-pci 0000:00:1d.1 eno1: Register
>> MEM_TYPE_PAGE_POOL RxQ-6
>> [ 553.069025] intel-eth-pci 0000:00:1d.1 eno1: Register
>> MEM_TYPE_PAGE_POOL RxQ-7
>> [ 553.082011] dwmac4: Master AXI performs any burst length
>> [ 553.082094] intel-eth-pci 0000:00:1d.1 eno1: Enabling Safety Features
>> [ 553.082138] intel-eth-pci 0000:00:1d.1 eno1: IEEE 1588-2008 Advanced
>> Timestamp supported
>> [ 553.082438] intel-eth-pci 0000:00:1d.1 eno1: registered PTP clock
>> [ 553.082641] intel-eth-pci 0000:00:1d.1 eno1: FPE workqueue start
>> [ 553.082649] intel-eth-pci 0000:00:1d.1 eno1: configuring for
>> inband/sgmii link mode
>> [ 553.083700] intel-eth-pci 0000:00:1d.1 eno1: Link is Up - 1Gbps/Full
>> - flow control off
>>
>> This does not happen during normal ssh to the EHL board, this does not
>> happen if I do port forwarding through userspace, e. g. with socat. And
>> this does not happen if two independent network interfaces are used in
>> the iptables rules. I have not observed the issue with other network
>> chips with the same kernel version so I think the issue must be in the
>> stmmac driver.
>>
>> What happens is that the OWNED bit of a DMA descriptor is not reset
>> (detected in dwmac4_wrback_get_tx_status()) and the queue's txtimer is
>> looping endlessly until the timeout resets everything. Sometimes the
>> situation is solved before the timeout, most of the times it is not but
>> i always takes long until the OWNED bit is reset. This can be reproduced
>> with 100% success.
>>
>> I have tried to understand where the issue comes from and I think it may
>> have something to do with the calls to dma_wmb() and wmb() (or more
>> precisely with missing barriers) but so far I had no success to solve it
>> and provide a patch.
>>
>> The kernel that I am running is a quite recent linux-net
>> (b60417a9f2b8). I have also tried with linux-net-next without
>> success. The issue can already be observed with v5.15.39.
>>
>> Any help or suggestion how I can debug this further would be
>> appreciated. Or if somebody else can reproduce or not reproduce the
>> issue on the given platform may help as well.
>>
>> Regards
>> -Jochen
>
> Some more details about the devices involved:
>
> intel-eth-pci 0000:00:1d.1: enabling device (0000 -> 0002)
> intel-eth-pci 0000:00:1d.1: stmmac_config_multi_msi: multi MSI enablement successful
> intel-eth-pci 0000:00:1d.1: User ID: 0x51, Synopsys ID: 0x52
> intel-eth-pci 0000:00:1d.1: DWMAC4/5
> intel-eth-pci 0000:00:1d.1: DMA HW capability register supported
> intel-eth-pci 0000:00:1d.1: RX Checksum Offload Engine supported
> intel-eth-pci 0000:00:1d.1: TX Checksum insertion supported
> intel-eth-pci 0000:00:1d.1: TSO supported
> intel-eth-pci 0000:00:1d.1: Enable RX Mitigation via HW Watchdog Timer
> intel-eth-pci 0000:00:1d.1: device MAC address 00:e0:4b:74:93:db
> intel-eth-pci 0000:00:1d.1: Enabled L3L4 Flow TC (entries=2)
> intel-eth-pci 0000:00:1d.1: Enabled RFS Flow TC (entries=10)
> intel-eth-pci 0000:00:1d.1: Enabling HW TC (entries=256, max_off=256)
> intel-eth-pci 0000:00:1d.1: TSO feature enabled
> intel-eth-pci 0000:00:1d.1: Using 32 bits DMA width
> Maxlinear Ethernet GPY115B stmmac-2:01: Firmware Version: 7.110 (0x876E)
> Maxlinear Ethernet GPY115B stmmac-2:01: attached PHY driver (mii_bus:phy_addr=stmmac-2:01>
> intel-eth-pci 0000:00:1d.1 eno1: renamed from eth0
> intel-eth-pci 0000:00:1d.1 eno1: Register MEM_TYPE_PAGE_POOL RxQ-0
> intel-eth-pci 0000:00:1d.1 eno1: Register MEM_TYPE_PAGE_POOL RxQ-1
> intel-eth-pci 0000:00:1d.1 eno1: Register MEM_TYPE_PAGE_POOL RxQ-2
> intel-eth-pci 0000:00:1d.1 eno1: Register MEM_TYPE_PAGE_POOL RxQ-3
> intel-eth-pci 0000:00:1d.1 eno1: Register MEM_TYPE_PAGE_POOL RxQ-4
> intel-eth-pci 0000:00:1d.1 eno1: Register MEM_TYPE_PAGE_POOL RxQ-5
> intel-eth-pci 0000:00:1d.1 eno1: Register MEM_TYPE_PAGE_POOL RxQ-6
> intel-eth-pci 0000:00:1d.1 eno1: Register MEM_TYPE_PAGE_POOL RxQ-7
> intel-eth-pci 0000:00:1d.1 eno1: Enabling Safety Features
> intel-eth-pci 0000:00:1d.1 eno1: IEEE 1588-2008 Advanced Timestamp supported
> intel-eth-pci 0000:00:1d.1 eno1: registered PTP clock
> intel-eth-pci 0000:00:1d.1 eno1: FPE workqueue start
> intel-eth-pci 0000:00:1d.1 eno1: configuring for inband/sgmii link mode
> intel-eth-pci 0000:00:1d.1 eno1: Link is Up - 1Gbps/Full - flow control off
>

No further help needed. I have found the root cause and will provide a
fix.

-Jochen