2021-04-07 01:53:39

by Oleksandr Natalenko

[permalink] [raw]
Subject: [igb] netconsole triggers warning in netpoll_poll_dev

Hello.

I've raised this here [1] first, but was suggested to engage igb devs,
so here we are.

I'm experiencing the following woes while using netconsole regularly:

```
[22038.710800] ------------[ cut here ]------------
[22038.710801] igb_poll+0x0/0x1440 [igb] exceeded budget in poll
[22038.710802] WARNING: CPU: 12 PID: 40362 at net/core/netpoll.c:155 netpoll_poll_dev+0x18a/0x1a0
[22038.710802] Modules linked in: blocklayoutdriver rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace sunrpc nfs_ssc fscache uinput netconsole rfcomm nft_ct nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 cmac algif_hash algif_skcipher af_alg nf_tables tun bnep nfnetlink iwlmvm snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio mac80211 snd_hda_codec_hdmi snd_hda_intel nls_iso8859_1 snd_intel_dspcfg soundwire_intel vfat libarc4 soundwire_generic_allocation soundwire_cadence fat intel_rapl_msr snd_hda_codec intel_rapl_common iwlwifi snd_hda_core soundwire_bus snd_usb_audio snd_soc_core eeepc_wmi btusb asus_wmi snd_usbmidi_lib btrtl edac_mce_amd sparse_keymap btbcm video wmi_bmof mxm_wmi kvm_amd snd_hwdep snd_compress btintel snd_rawmidi ac97_bus cfg80211 uvcvideo bluetooth snd_pcm_dmaengine snd_seq_device snd_pcm videobuf2_vmalloc kvm videobuf2_memops snd_timer videobuf2_v4l2 joydev igb mousedev ecdh_generic snd irqbypass pl2303 ipmi_devintf ecc videobuf2_common soundcore
[22038.710820] r8169 ipmi_msghandler rapl crc16 k10temp rfkill sp5100_tco dca i2c_piix4 tpm_crb realtek mdio_devres tpm_tis libphy tpm_tis_core pinctrl_amd wmi acpi_cpufreq mac_hid tcp_bbr2 vhost_vsock vmw_vsock_virtio_transport_common vhost vhost_iotlb vsock v4l2loopback videodev mc nct6775 hwmon_vid crypto_user fuse ip_tables x_tables xfs dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c crc32c_generic dm_crypt cbc encrypted_keys trusted tpm hid_logitech_hidpp hid_logitech_dj usbhid dm_mod raid10 crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel md_mod crypto_simd cryptd glue_helper ccp amdgpu rng_core xhci_pci xhci_pci_renesas drm_ttm_helper ttm gpu_sched i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops cec drm agpgart
[22038.710835] CPU: 12 PID: 40362 Comm: systemd-sleep Not tainted 5.11.0-pf7 #1
[22038.710836] Hardware name: ASUS System Product Name/Pro WS X570-ACE, BIOS 3302 03/05/2021
[22038.710836] RIP: 0010:netpoll_poll_dev+0x18a/0x1a0
[22038.710837] Code: 6e ff 80 3d d2 9d f8 00 00 0f 85 5c ff ff ff 48 8b 73 28 48 c7 c7 0c b8 21 84 89 44 24 04 c6 05 b6 9d f8 00 01 e8 84 21 1c 00 <0f> 0b 8b 54 24 04 e9 36 ff ff ff 66 66 2e 0f 1f 84 00 00 00 00 00
[22038.710838] RSP: 0018:ffffb24106e37ba0 EFLAGS: 00010086
[22038.710838] RAX: 0000000000000000 RBX: ffff9599d2929c50 RCX: ffff959f8ed1ac30
[22038.710839] RDX: 0000000000000000 RSI: 0000000000000023 RDI: ffff959f8ed1ac28
[22038.710839] RBP: ffff9598981d4058 R08: 0000000000000019 R09: ffffb24206e3796d
[22038.710839] R10: ffffffffffffffff R11: ffffb24106e37968 R12: ffff959887e51ec8
[22038.710840] R13: 000000000000000c R14: 00000000ffffffff R15: ffff9599d2929c60
[22038.710840] FS: 00007f3ade370a40(0000) GS:ffff959f8ed00000(0000) knlGS:0000000000000000
[22038.710841] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[22038.710841] CR2: 0000000000000000 CR3: 00000003017b0000 CR4: 0000000000350ee0
[22038.710841] Call Trace:
[22038.710842] netpoll_send_skb+0x185/0x240
[22038.710842] write_msg+0xe5/0x100 [netconsole]
[22038.710842] console_unlock+0x37d/0x640
[22038.710842] ? __schedule+0x2e5/0xc90
[22038.710843] suspend_devices_and_enter+0x2ac/0x7f0
[22038.710843] pm_suspend.cold+0x321/0x36c
[22038.710843] state_store+0xa6/0x140
[22038.710844] kernfs_fop_write_iter+0x124/0x1b0
[22038.710844] new_sync_write+0x16a/0x200
[22038.710844] vfs_write+0x21c/0x2e0
[22038.710844] __x64_sys_write+0x6d/0xf0
[22038.710845] do_syscall_64+0x33/0x40
[22038.710845] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[22038.710845] RIP: 0033:0x7f3adece10f7
[22038.710846] Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
[22038.710847] RSP: 002b:00007ffc51c555b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[22038.710847] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007f3adece10f7
[22038.710848] RDX: 0000000000000004 RSI: 00007ffc51c556a0 RDI: 0000000000000004
[22038.710848] RBP: 00007ffc51c556a0 R08: 000055ea374302a0 R09: 00007f3aded770c0
[22038.710849] R10: 00007f3aded76fc0 R11: 0000000000000246 R12: 0000000000000004
[22038.710849] R13: 000055ea3742c430 R14: 0000000000000004 R15: 00007f3adedb3700
[22038.710849] ---[ end trace 6eae54fbf23807f8 ]---
```

This one happened during suspend/resume cycle (on resume), followed by:

```
[22038.868669] igb 0000:05:00.0 enp5s0: Reset adapter
[22040.998673] igb 0000:05:00.0 enp5s0: Reset adapter
[22043.819198] igb 0000:05:00.0 enp5s0: igb: enp5s0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
```

I've bumped into a similar issue in BZ 211911 [2] (see c#16),
and in c#17 it was suggested it was a separate unrelated issue,
hence I'm raising a new concern.

Please help in finding out why this woe happens and in fixing it.

Thanks.

[1] https://bugzilla.kernel.org/show_bug.cgi?id=212573
[2] https://bugzilla.kernel.org/show_bug.cgi?id=211911

--
Oleksandr Natalenko (post-factum)


2021-04-07 08:20:04

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [igb] netconsole triggers warning in netpoll_poll_dev

On Tue, 6 Apr 2021 14:36:19 +0200 Oleksandr Natalenko wrote:
> Hello.
>
> I've raised this here [1] first, but was suggested to engage igb devs,
> so here we are.
>
> I'm experiencing the following woes while using netconsole regularly:
>
> ```
> [22038.710800] ------------[ cut here ]------------
> [22038.710801] igb_poll+0x0/0x1440 [igb] exceeded budget in poll
> [22038.710802] WARNING: CPU: 12 PID: 40362 at net/core/netpoll.c:155 netpoll_poll_dev+0x18a/0x1a0
> [22038.710802] Modules linked in: ...
> [22038.710835] CPU: 12 PID: 40362 Comm: systemd-sleep Not tainted 5.11.0-pf7 #1
> [22038.710836] Hardware name: ASUS System Product Name/Pro WS X570-ACE, BIOS 3302 03/05/2021
> [22038.710836] RIP: 0010:netpoll_poll_dev+0x18a/0x1a0
> [22038.710837] Code: 6e ff 80 3d d2 9d f8 00 00 0f 85 5c ff ff ff 48 8b 73 28 48 c7 c7 0c b8 21 84 89 44 24 04 c6 05 b6 9d f8 00 01 e8 84 21 1c 00 <0f> 0b 8b 54 24 04 e9 36 ff ff ff 66 66 2e 0f 1f 84 00 00 00 00 00
> [22038.710838] RSP: 0018:ffffb24106e37ba0 EFLAGS: 00010086
> [22038.710838] RAX: 0000000000000000 RBX: ffff9599d2929c50 RCX: ffff959f8ed1ac30
> [22038.710839] RDX: 0000000000000000 RSI: 0000000000000023 RDI: ffff959f8ed1ac28
> [22038.710839] RBP: ffff9598981d4058 R08: 0000000000000019 R09: ffffb24206e3796d
> [22038.710839] R10: ffffffffffffffff R11: ffffb24106e37968 R12: ffff959887e51ec8
> [22038.710840] R13: 000000000000000c R14: 00000000ffffffff R15: ffff9599d2929c60
> [22038.710840] FS: 00007f3ade370a40(0000) GS:ffff959f8ed00000(0000) knlGS:0000000000000000
> [22038.710841] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [22038.710841] CR2: 0000000000000000 CR3: 00000003017b0000 CR4: 0000000000350ee0
> [22038.710841] Call Trace:
> [22038.710842] netpoll_send_skb+0x185/0x240
> [22038.710842] write_msg+0xe5/0x100 [netconsole]
> [22038.710842] console_unlock+0x37d/0x640
> [22038.710842] ? __schedule+0x2e5/0xc90
> [22038.710843] suspend_devices_and_enter+0x2ac/0x7f0
> [22038.710843] pm_suspend.cold+0x321/0x36c
> [22038.710843] state_store+0xa6/0x140
> [22038.710844] kernfs_fop_write_iter+0x124/0x1b0
> [22038.710844] new_sync_write+0x16a/0x200
> [22038.710844] vfs_write+0x21c/0x2e0
> [22038.710844] __x64_sys_write+0x6d/0xf0
> [22038.710845] do_syscall_64+0x33/0x40
> [22038.710845] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [22038.710845] RIP: 0033:0x7f3adece10f7
> [22038.710846] Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
> [22038.710847] RSP: 002b:00007ffc51c555b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
> [22038.710847] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007f3adece10f7
> [22038.710848] RDX: 0000000000000004 RSI: 00007ffc51c556a0 RDI: 0000000000000004
> [22038.710848] RBP: 00007ffc51c556a0 R08: 000055ea374302a0 R09: 00007f3aded770c0
> [22038.710849] R10: 00007f3aded76fc0 R11: 0000000000000246 R12: 0000000000000004
> [22038.710849] R13: 000055ea3742c430 R14: 0000000000000004 R15: 00007f3adedb3700
> [22038.710849] ---[ end trace 6eae54fbf23807f8 ]---
> ```
>
> This one happened during suspend/resume cycle (on resume), followed by:
>
> ```
> [22038.868669] igb 0000:05:00.0 enp5s0: Reset adapter
> [22040.998673] igb 0000:05:00.0 enp5s0: Reset adapter
> [22043.819198] igb 0000:05:00.0 enp5s0: igb: enp5s0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
> ```
>
> I've bumped into a similar issue in BZ 211911 [2] (see c#16),
> and in c#17 it was suggested it was a separate unrelated issue,
> hence I'm raising a new concern.
>
> Please help in finding out why this woe happens and in fixing it.
>
> Thanks.
>
> [1] https://bugzilla.kernel.org/show_bug.cgi?id=212573
> [2] https://bugzilla.kernel.org/show_bug.cgi?id=211911

Looks like igb_clean_tx_irq() should not return true if budget is 0,
ever, otherwise we risk hitting the min(work, budget - 1) which may
go negative.

So something like this?

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index c9e8c65a3cfe..7a237b5311ca 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -8028,7 +8028,7 @@ static bool igb_clean_tx_irq(struct igb_q_vector *q_vector, int napi_budget)
unsigned int i = tx_ring->next_to_clean;

if (test_bit(__IGB_DOWN, &adapter->state))
- return true;
+ goto out;

tx_buffer = &tx_ring->tx_buffer_info[i];
tx_desc = IGB_TX_DESC(tx_ring, i);
@@ -8157,7 +8157,7 @@ static bool igb_clean_tx_irq(struct igb_q_vector *q_vector, int napi_budget)
tx_ring->queue_index);

/* we are about to reset, no point in enabling stuff */
- return true;
+ goto out;
}
}

@@ -8180,7 +8180,7 @@ static bool igb_clean_tx_irq(struct igb_q_vector *q_vector, int napi_budget)
u64_stats_update_end(&tx_ring->tx_syncp);
}
}
-
+out:
return !!budget;
}

2021-04-07 14:51:27

by Oleksandr Natalenko

[permalink] [raw]
Subject: Re: [igb] netconsole triggers warning in netpoll_poll_dev

Hello.

On Tue, Apr 06, 2021 at 11:48:02AM -0700, Jakub Kicinski wrote:
> On Tue, 6 Apr 2021 14:36:19 +0200 Oleksandr Natalenko wrote:
> > Hello.
> >
> > I've raised this here [1] first, but was suggested to engage igb devs,
> > so here we are.
> >
> > I'm experiencing the following woes while using netconsole regularly:
> >
> > ```
> > [22038.710800] ------------[ cut here ]------------
> > [22038.710801] igb_poll+0x0/0x1440 [igb] exceeded budget in poll
> > [22038.710802] WARNING: CPU: 12 PID: 40362 at net/core/netpoll.c:155 netpoll_poll_dev+0x18a/0x1a0
> > [22038.710802] Modules linked in: ...
> > [22038.710835] CPU: 12 PID: 40362 Comm: systemd-sleep Not tainted 5.11.0-pf7 #1
> > [22038.710836] Hardware name: ASUS System Product Name/Pro WS X570-ACE, BIOS 3302 03/05/2021
> > [22038.710836] RIP: 0010:netpoll_poll_dev+0x18a/0x1a0
> > [22038.710837] Code: 6e ff 80 3d d2 9d f8 00 00 0f 85 5c ff ff ff 48 8b 73 28 48 c7 c7 0c b8 21 84 89 44 24 04 c6 05 b6 9d f8 00 01 e8 84 21 1c 00 <0f> 0b 8b 54 24 04 e9 36 ff ff ff 66 66 2e 0f 1f 84 00 00 00 00 00
> > [22038.710838] RSP: 0018:ffffb24106e37ba0 EFLAGS: 00010086
> > [22038.710838] RAX: 0000000000000000 RBX: ffff9599d2929c50 RCX: ffff959f8ed1ac30
> > [22038.710839] RDX: 0000000000000000 RSI: 0000000000000023 RDI: ffff959f8ed1ac28
> > [22038.710839] RBP: ffff9598981d4058 R08: 0000000000000019 R09: ffffb24206e3796d
> > [22038.710839] R10: ffffffffffffffff R11: ffffb24106e37968 R12: ffff959887e51ec8
> > [22038.710840] R13: 000000000000000c R14: 00000000ffffffff R15: ffff9599d2929c60
> > [22038.710840] FS: 00007f3ade370a40(0000) GS:ffff959f8ed00000(0000) knlGS:0000000000000000
> > [22038.710841] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [22038.710841] CR2: 0000000000000000 CR3: 00000003017b0000 CR4: 0000000000350ee0
> > [22038.710841] Call Trace:
> > [22038.710842] netpoll_send_skb+0x185/0x240
> > [22038.710842] write_msg+0xe5/0x100 [netconsole]
> > [22038.710842] console_unlock+0x37d/0x640
> > [22038.710842] ? __schedule+0x2e5/0xc90
> > [22038.710843] suspend_devices_and_enter+0x2ac/0x7f0
> > [22038.710843] pm_suspend.cold+0x321/0x36c
> > [22038.710843] state_store+0xa6/0x140
> > [22038.710844] kernfs_fop_write_iter+0x124/0x1b0
> > [22038.710844] new_sync_write+0x16a/0x200
> > [22038.710844] vfs_write+0x21c/0x2e0
> > [22038.710844] __x64_sys_write+0x6d/0xf0
> > [22038.710845] do_syscall_64+0x33/0x40
> > [22038.710845] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > [22038.710845] RIP: 0033:0x7f3adece10f7
> > [22038.710846] Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
> > [22038.710847] RSP: 002b:00007ffc51c555b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
> > [22038.710847] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007f3adece10f7
> > [22038.710848] RDX: 0000000000000004 RSI: 00007ffc51c556a0 RDI: 0000000000000004
> > [22038.710848] RBP: 00007ffc51c556a0 R08: 000055ea374302a0 R09: 00007f3aded770c0
> > [22038.710849] R10: 00007f3aded76fc0 R11: 0000000000000246 R12: 0000000000000004
> > [22038.710849] R13: 000055ea3742c430 R14: 0000000000000004 R15: 00007f3adedb3700
> > [22038.710849] ---[ end trace 6eae54fbf23807f8 ]---
> > ```
> >
> > This one happened during suspend/resume cycle (on resume), followed by:
> >
> > ```
> > [22038.868669] igb 0000:05:00.0 enp5s0: Reset adapter
> > [22040.998673] igb 0000:05:00.0 enp5s0: Reset adapter
> > [22043.819198] igb 0000:05:00.0 enp5s0: igb: enp5s0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
> > ```
> >
> > I've bumped into a similar issue in BZ 211911 [2] (see c#16),
> > and in c#17 it was suggested it was a separate unrelated issue,
> > hence I'm raising a new concern.
> >
> > Please help in finding out why this woe happens and in fixing it.
> >
> > Thanks.
> >
> > [1] https://bugzilla.kernel.org/show_bug.cgi?id=212573
> > [2] https://bugzilla.kernel.org/show_bug.cgi?id=211911
>
> Looks like igb_clean_tx_irq() should not return true if budget is 0,
> ever, otherwise we risk hitting the min(work, budget - 1) which may
> go negative.
>
> So something like this?
>
> diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
> index c9e8c65a3cfe..7a237b5311ca 100644
> --- a/drivers/net/ethernet/intel/igb/igb_main.c
> +++ b/drivers/net/ethernet/intel/igb/igb_main.c
> @@ -8028,7 +8028,7 @@ static bool igb_clean_tx_irq(struct igb_q_vector *q_vector, int napi_budget)
> unsigned int i = tx_ring->next_to_clean;
>
> if (test_bit(__IGB_DOWN, &adapter->state))
> - return true;
> + goto out;
>
> tx_buffer = &tx_ring->tx_buffer_info[i];
> tx_desc = IGB_TX_DESC(tx_ring, i);
> @@ -8157,7 +8157,7 @@ static bool igb_clean_tx_irq(struct igb_q_vector *q_vector, int napi_budget)
> tx_ring->queue_index);
>
> /* we are about to reset, no point in enabling stuff */
> - return true;
> + goto out;
> }
> }
>
> @@ -8180,7 +8180,7 @@ static bool igb_clean_tx_irq(struct igb_q_vector *q_vector, int napi_budget)
> u64_stats_update_end(&tx_ring->tx_syncp);
> }
> }
> -
> +out:
> return !!budget;
> }

Thanks for the effort, but reportedly [1] it made no difference,
unfortunately.

[1] https://bugzilla.kernel.org/show_bug.cgi?id=212573#c8

--
Oleksandr Natalenko (post-factum)

2021-04-07 21:37:12

by Alexander Duyck

[permalink] [raw]
Subject: Re: [igb] netconsole triggers warning in netpoll_poll_dev

On Wed, Apr 7, 2021 at 8:37 AM Jakub Kicinski <[email protected]> wrote:
>
> On Wed, 7 Apr 2021 08:00:53 +0200 Oleksandr Natalenko wrote:
> > Thanks for the effort, but reportedly [1] it made no difference,
> > unfortunately.
> >
> > [1] https://bugzilla.kernel.org/show_bug.cgi?id=212573#c8
>
> The only other option I see is that somehow the NAPI has no rings.
>
> diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
> index a45cd2b416c8..24568adc2fb1 100644
> --- a/drivers/net/ethernet/intel/igb/igb_main.c
> +++ b/drivers/net/ethernet/intel/igb/igb_main.c
> @@ -7980,7 +7980,7 @@ static int igb_poll(struct napi_struct *napi, int budget)
> struct igb_q_vector *q_vector = container_of(napi,
> struct igb_q_vector,
> napi);
> - bool clean_complete = true;
> + bool clean_complete = q_vector->tx.ring || q_vector->rx.ring;
> int work_done = 0;
>
> #ifdef CONFIG_IGB_DCA

It might make sense to just cast the work_done as a unsigned int, and
then on the end of igb_poll use:
return min_t(unsigned int, work_done, budget - 1);

2021-04-07 21:49:50

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [igb] netconsole triggers warning in netpoll_poll_dev

On Wed, 7 Apr 2021 09:25:28 -0700 Alexander Duyck wrote:
> On Wed, Apr 7, 2021 at 8:37 AM Jakub Kicinski <[email protected]> wrote:
> >
> > On Wed, 7 Apr 2021 08:00:53 +0200 Oleksandr Natalenko wrote:
> > > Thanks for the effort, but reportedly [1] it made no difference,
> > > unfortunately.
> > >
> > > [1] https://bugzilla.kernel.org/show_bug.cgi?id=212573#c8
> >
> > The only other option I see is that somehow the NAPI has no rings.
> >
> > diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
> > index a45cd2b416c8..24568adc2fb1 100644
> > --- a/drivers/net/ethernet/intel/igb/igb_main.c
> > +++ b/drivers/net/ethernet/intel/igb/igb_main.c
> > @@ -7980,7 +7980,7 @@ static int igb_poll(struct napi_struct *napi, int budget)
> > struct igb_q_vector *q_vector = container_of(napi,
> > struct igb_q_vector,
> > napi);
> > - bool clean_complete = true;
> > + bool clean_complete = q_vector->tx.ring || q_vector->rx.ring;
> > int work_done = 0;
> >
> > #ifdef CONFIG_IGB_DCA
>
> It might make sense to just cast the work_done as a unsigned int, and
> then on the end of igb_poll use:
> return min_t(unsigned int, work_done, budget - 1);

Sure, that's simplest. I wasn't sure something is supposed to prevent
this condition or if it's okay to cover it up.

2021-04-07 22:39:31

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [igb] netconsole triggers warning in netpoll_poll_dev

On Wed, 7 Apr 2021 08:00:53 +0200 Oleksandr Natalenko wrote:
> Thanks for the effort, but reportedly [1] it made no difference,
> unfortunately.
>
> [1] https://bugzilla.kernel.org/show_bug.cgi?id=212573#c8

The only other option I see is that somehow the NAPI has no rings.

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index a45cd2b416c8..24568adc2fb1 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -7980,7 +7980,7 @@ static int igb_poll(struct napi_struct *napi, int budget)
struct igb_q_vector *q_vector = container_of(napi,
struct igb_q_vector,
napi);
- bool clean_complete = true;
+ bool clean_complete = q_vector->tx.ring || q_vector->rx.ring;
int work_done = 0;

#ifdef CONFIG_IGB_DCA

2021-04-07 23:08:13

by Alexander Duyck

[permalink] [raw]
Subject: Re: [igb] netconsole triggers warning in netpoll_poll_dev

On Wed, Apr 7, 2021 at 11:07 AM Jakub Kicinski <[email protected]> wrote:
>
> On Wed, 7 Apr 2021 09:25:28 -0700 Alexander Duyck wrote:
> > On Wed, Apr 7, 2021 at 8:37 AM Jakub Kicinski <[email protected]> wrote:
> > >
> > > On Wed, 7 Apr 2021 08:00:53 +0200 Oleksandr Natalenko wrote:
> > > > Thanks for the effort, but reportedly [1] it made no difference,
> > > > unfortunately.
> > > >
> > > > [1] https://bugzilla.kernel.org/show_bug.cgi?id=212573#c8
> > >
> > > The only other option I see is that somehow the NAPI has no rings.
> > >
> > > diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
> > > index a45cd2b416c8..24568adc2fb1 100644
> > > --- a/drivers/net/ethernet/intel/igb/igb_main.c
> > > +++ b/drivers/net/ethernet/intel/igb/igb_main.c
> > > @@ -7980,7 +7980,7 @@ static int igb_poll(struct napi_struct *napi, int budget)
> > > struct igb_q_vector *q_vector = container_of(napi,
> > > struct igb_q_vector,
> > > napi);
> > > - bool clean_complete = true;
> > > + bool clean_complete = q_vector->tx.ring || q_vector->rx.ring;
> > > int work_done = 0;
> > >
> > > #ifdef CONFIG_IGB_DCA
> >
> > It might make sense to just cast the work_done as a unsigned int, and
> > then on the end of igb_poll use:
> > return min_t(unsigned int, work_done, budget - 1);
>
> Sure, that's simplest. I wasn't sure something is supposed to prevent
> this condition or if it's okay to cover it up.

I'm pretty sure it is okay to cover it up. In this case the "budget -
1" is supposed to be the upper limit on what can be reported. I think
it was assuming an unsigned value anyway.

Another alternative would be to default clean_complete to !!budget.
Then if budget is 0 clean_complete would always return false.

2021-04-23 08:20:54

by Oleksandr Natalenko

[permalink] [raw]
Subject: Re: [igb] netconsole triggers warning in netpoll_poll_dev

Hello.

On Wed, Apr 07, 2021 at 04:06:29PM -0700, Alexander Duyck wrote:
> On Wed, Apr 7, 2021 at 11:07 AM Jakub Kicinski <[email protected]> wrote:
> >
> > On Wed, 7 Apr 2021 09:25:28 -0700 Alexander Duyck wrote:
> > > On Wed, Apr 7, 2021 at 8:37 AM Jakub Kicinski <[email protected]> wrote:
> > > >
> > > > On Wed, 7 Apr 2021 08:00:53 +0200 Oleksandr Natalenko wrote:
> > > > > Thanks for the effort, but reportedly [1] it made no difference,
> > > > > unfortunately.
> > > > >
> > > > > [1] https://bugzilla.kernel.org/show_bug.cgi?id=212573#c8
> > > >
> > > > The only other option I see is that somehow the NAPI has no rings.
> > > >
> > > > diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
> > > > index a45cd2b416c8..24568adc2fb1 100644
> > > > --- a/drivers/net/ethernet/intel/igb/igb_main.c
> > > > +++ b/drivers/net/ethernet/intel/igb/igb_main.c
> > > > @@ -7980,7 +7980,7 @@ static int igb_poll(struct napi_struct *napi, int budget)
> > > > struct igb_q_vector *q_vector = container_of(napi,
> > > > struct igb_q_vector,
> > > > napi);
> > > > - bool clean_complete = true;
> > > > + bool clean_complete = q_vector->tx.ring || q_vector->rx.ring;
> > > > int work_done = 0;
> > > >
> > > > #ifdef CONFIG_IGB_DCA
> > >
> > > It might make sense to just cast the work_done as a unsigned int, and
> > > then on the end of igb_poll use:
> > > return min_t(unsigned int, work_done, budget - 1);
> >
> > Sure, that's simplest. I wasn't sure something is supposed to prevent
> > this condition or if it's okay to cover it up.
>
> I'm pretty sure it is okay to cover it up. In this case the "budget -
> 1" is supposed to be the upper limit on what can be reported. I think
> it was assuming an unsigned value anyway.
>
> Another alternative would be to default clean_complete to !!budget.
> Then if budget is 0 clean_complete would always return false.

So, among all the variants, which one to try? Or there was a separate
patch sent to address this?

Thanks.

--
Oleksandr Natalenko (post-factum)

2021-04-23 23:04:11

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [igb] netconsole triggers warning in netpoll_poll_dev

On Fri, 23 Apr 2021 10:19:44 +0200 Oleksandr Natalenko wrote:
> On Wed, Apr 07, 2021 at 04:06:29PM -0700, Alexander Duyck wrote:
> > On Wed, Apr 7, 2021 at 11:07 AM Jakub Kicinski <[email protected]> wrote:
> > > Sure, that's simplest. I wasn't sure something is supposed to prevent
> > > this condition or if it's okay to cover it up.
> >
> > I'm pretty sure it is okay to cover it up. In this case the "budget -
> > 1" is supposed to be the upper limit on what can be reported. I think
> > it was assuming an unsigned value anyway.
> >
> > Another alternative would be to default clean_complete to !!budget.
> > Then if budget is 0 clean_complete would always return false.
>
> So, among all the variants, which one to try? Or there was a separate
> patch sent to address this?

Alex's suggestion is probably best.

I'm not aware of the fix being posted. Perhaps you could take over and
post the patch if Intel doesn't chime in?

2021-04-26 06:48:24

by Oleksandr Natalenko

[permalink] [raw]
Subject: Re: [igb] netconsole triggers warning in netpoll_poll_dev

Hello.

On Fri, Apr 23, 2021 at 03:58:36PM -0700, Jakub Kicinski wrote:
> On Fri, 23 Apr 2021 10:19:44 +0200 Oleksandr Natalenko wrote:
> > On Wed, Apr 07, 2021 at 04:06:29PM -0700, Alexander Duyck wrote:
> > > On Wed, Apr 7, 2021 at 11:07 AM Jakub Kicinski <[email protected]> wrote:
> > > > Sure, that's simplest. I wasn't sure something is supposed to prevent
> > > > this condition or if it's okay to cover it up.
> > >
> > > I'm pretty sure it is okay to cover it up. In this case the "budget -
> > > 1" is supposed to be the upper limit on what can be reported. I think
> > > it was assuming an unsigned value anyway.
> > >
> > > Another alternative would be to default clean_complete to !!budget.
> > > Then if budget is 0 clean_complete would always return false.
> >
> > So, among all the variants, which one to try? Or there was a separate
> > patch sent to address this?
>
> Alex's suggestion is probably best.
>
> I'm not aware of the fix being posted. Perhaps you could take over and
> post the patch if Intel doesn't chime in?

So, IIUC, Alex suggests this:

```
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index a45cd2b416c8..7503d5bf168a 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -7981,7 +7981,7 @@ static int igb_poll(struct napi_struct *napi, int budget)
struct igb_q_vector,
napi);
bool clean_complete = true;
- int work_done = 0;
+ unsigned int work_done = 0;

#ifdef CONFIG_IGB_DCA
if (q_vector->adapter->flags & IGB_FLAG_DCA_ENABLED)
@@ -8008,7 +8008,7 @@ static int igb_poll(struct napi_struct *napi, int budget)
if (likely(napi_complete_done(napi, work_done)))
igb_ring_irq_enable(q_vector);

- return min(work_done, budget - 1);
+ return min_t(unsigned int, work_done, budget - 1);
}

/**
```

Am I right?

Thanks.

--
Oleksandr Natalenko (post-factum)

2021-04-26 15:30:02

by Alexander Duyck

[permalink] [raw]
Subject: Re: [igb] netconsole triggers warning in netpoll_poll_dev

On Sun, Apr 25, 2021 at 11:47 PM Oleksandr Natalenko
<[email protected]> wrote:
>
> Hello.
>
> On Fri, Apr 23, 2021 at 03:58:36PM -0700, Jakub Kicinski wrote:
> > On Fri, 23 Apr 2021 10:19:44 +0200 Oleksandr Natalenko wrote:
> > > On Wed, Apr 07, 2021 at 04:06:29PM -0700, Alexander Duyck wrote:
> > > > On Wed, Apr 7, 2021 at 11:07 AM Jakub Kicinski <[email protected]> wrote:
> > > > > Sure, that's simplest. I wasn't sure something is supposed to prevent
> > > > > this condition or if it's okay to cover it up.
> > > >
> > > > I'm pretty sure it is okay to cover it up. In this case the "budget -
> > > > 1" is supposed to be the upper limit on what can be reported. I think
> > > > it was assuming an unsigned value anyway.
> > > >
> > > > Another alternative would be to default clean_complete to !!budget.
> > > > Then if budget is 0 clean_complete would always return false.
> > >
> > > So, among all the variants, which one to try? Or there was a separate
> > > patch sent to address this?
> >
> > Alex's suggestion is probably best.
> >
> > I'm not aware of the fix being posted. Perhaps you could take over and
> > post the patch if Intel doesn't chime in?
>
> So, IIUC, Alex suggests this:
>
> ```
> diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
> index a45cd2b416c8..7503d5bf168a 100644
> --- a/drivers/net/ethernet/intel/igb/igb_main.c
> +++ b/drivers/net/ethernet/intel/igb/igb_main.c
> @@ -7981,7 +7981,7 @@ static int igb_poll(struct napi_struct *napi, int budget)
> struct igb_q_vector,
> napi);
> bool clean_complete = true;
> - int work_done = 0;
> + unsigned int work_done = 0;
>
> #ifdef CONFIG_IGB_DCA
> if (q_vector->adapter->flags & IGB_FLAG_DCA_ENABLED)
> @@ -8008,7 +8008,7 @@ static int igb_poll(struct napi_struct *napi, int budget)
> if (likely(napi_complete_done(napi, work_done)))
> igb_ring_irq_enable(q_vector);
>
> - return min(work_done, budget - 1);
> + return min_t(unsigned int, work_done, budget - 1);
> }
>
> /**
> ```
>
> Am I right?
>
> Thanks.

Actually a better way to go would be to probably just initialize
"clean_complete = !!budget". With that we don't have it messing with
the interrupt enables which would probably be a better behavior.

2021-05-07 01:09:38

by Jesse Brandeburg

[permalink] [raw]
Subject: Re: [igb] netconsole triggers warning in netpoll_poll_dev

Alexander Duyck wrote:

> On Sun, Apr 25, 2021 at 11:47 PM Oleksandr Natalenko
> <[email protected]> wrote:
> >
> > Hello.
> >
> > On Fri, Apr 23, 2021 at 03:58:36PM -0700, Jakub Kicinski wrote:
> > > On Fri, 23 Apr 2021 10:19:44 +0200 Oleksandr Natalenko wrote:
> > > > On Wed, Apr 07, 2021 at 04:06:29PM -0700, Alexander Duyck wrote:
> > > > > On Wed, Apr 7, 2021 at 11:07 AM Jakub Kicinski <[email protected]> wrote:
> > > > > > Sure, that's simplest. I wasn't sure something is supposed to prevent
> > > > > > this condition or if it's okay to cover it up.
> > > > >
> > > > > I'm pretty sure it is okay to cover it up. In this case the "budget -
> > > > > 1" is supposed to be the upper limit on what can be reported. I think
> > > > > it was assuming an unsigned value anyway.
> > > > >
> > > > > Another alternative would be to default clean_complete to !!budget.
> > > > > Then if budget is 0 clean_complete would always return false.
> > > >
> > > > So, among all the variants, which one to try? Or there was a separate
> > > > patch sent to address this?
> > >
> > > Alex's suggestion is probably best.
> > >
> > > I'm not aware of the fix being posted. Perhaps you could take over and
> > > post the patch if Intel doesn't chime in?
> >
> > So, IIUC, Alex suggests this:
> >
> > ```
> > diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
> > index a45cd2b416c8..7503d5bf168a 100644
> > --- a/drivers/net/ethernet/intel/igb/igb_main.c
> > +++ b/drivers/net/ethernet/intel/igb/igb_main.c
> > @@ -7981,7 +7981,7 @@ static int igb_poll(struct napi_struct *napi, int budget)
> > struct igb_q_vector,
> > napi);
> > bool clean_complete = true;
> > - int work_done = 0;
> > + unsigned int work_done = 0;
> >
> > #ifdef CONFIG_IGB_DCA
> > if (q_vector->adapter->flags & IGB_FLAG_DCA_ENABLED)
> > @@ -8008,7 +8008,7 @@ static int igb_poll(struct napi_struct *napi, int budget)
> > if (likely(napi_complete_done(napi, work_done)))
> > igb_ring_irq_enable(q_vector);
> >
> > - return min(work_done, budget - 1);
> > + return min_t(unsigned int, work_done, budget - 1);
> > }
> >
> > /**
> > ```
> >
> > Am I right?
> >
> > Thanks.
>
> Actually a better way to go would be to probably just initialize
> "clean_complete = !!budget". With that we don't have it messing with
> the interrupt enables which would probably be a better behavior.


Thanks guys for the suggestions here! Finally got some time for
this, so here is the patch I'm going to queue shortly.

From ffd24e90d688ee347ab051266bfc7fca00324a68 Mon Sep 17 00:00:00 2001
From: Jesse Brandeburg <[email protected]>
Date: Thu, 6 May 2021 14:41:11 -0700
Subject: [PATCH net] igb: fix netpoll exit with traffic
To: netdev,
Oleksandr Natalenko <[email protected]>
Cc: Jakub Kicinski <[email protected]>, LKML <[email protected]>, "Brandeburg, Jesse" <[email protected]>, "Nguyen, Anthony L" <[email protected]>, "David S. Miller" <[email protected]>, intel-wired-lan <[email protected]>, Alexander Duyck <[email protected]>

Oleksandr brought a bug report where netpoll causes trace messages in
the log on igb.

[22038.710800] ------------[ cut here ]------------
[22038.710801] igb_poll+0x0/0x1440 [igb] exceeded budget in poll
[22038.710802] WARNING: CPU: 12 PID: 40362 at net/core/netpoll.c:155 netpoll_poll_dev+0x18a/0x1a0

After some discussion and debug from the list, it was deemed that the
right thing to do is initialize the clean_complete variable to false
when the "netpoll mode" of passing a zero budget is used.

This logic should be sane and not risky because the only time budget
should be zero on entry is netpoll. Change includes a small refactor
of local variable assignments to clean up the look.

Fixes: 16eb8815c235 ("igb: Refactor clean_rx_irq to reduce overhead and improve performance")
Reported-by: Oleksandr Natalenko <[email protected]>
Suggested-by: Alexander Duyck <[email protected]>
Signed-off-by: Jesse Brandeburg <[email protected]>
---

Compile tested ONLY, but functionally it should be exactly the same for
all cases except when budget is zero on entry, which will hopefully fix
the bug.
---
drivers/net/ethernet/intel/igb/igb_main.c | 12 ++++++++----
1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 0cd37ad81b4e..b0a9bed14071 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -7991,12 +7991,16 @@ static void igb_ring_irq_enable(struct igb_q_vector *q_vector)
**/
static int igb_poll(struct napi_struct *napi, int budget)
{
- struct igb_q_vector *q_vector = container_of(napi,
- struct igb_q_vector,
- napi);
- bool clean_complete = true;
+ struct igb_q_vector *q_vector;
+ bool clean_complete;
int work_done = 0;

+ /* if budget is zero, we have a special case for netconsole, so
+ * make sure to set clean_complete to false in that case.
+ */
+ clean_complete = !!budget;
+
+ q_vector = container_of(napi, struct igb_q_vector, napi);
#ifdef CONFIG_IGB_DCA
if (q_vector->adapter->flags & IGB_FLAG_DCA_ENABLED)
igb_update_dca(q_vector);
--
2.30.2

2021-05-07 01:33:29

by Alexander Duyck

[permalink] [raw]
Subject: Re: [igb] netconsole triggers warning in netpoll_poll_dev

On Thu, May 6, 2021 at 4:32 PM Jesse Brandeburg
<[email protected]> wrote:
>
> Alexander Duyck wrote:
>
> > On Sun, Apr 25, 2021 at 11:47 PM Oleksandr Natalenko
> > <[email protected]> wrote:
> > >
> > > Hello.
> > >
> > > On Fri, Apr 23, 2021 at 03:58:36PM -0700, Jakub Kicinski wrote:
> > > > On Fri, 23 Apr 2021 10:19:44 +0200 Oleksandr Natalenko wrote:
> > > > > On Wed, Apr 07, 2021 at 04:06:29PM -0700, Alexander Duyck wrote:
> > > > > > On Wed, Apr 7, 2021 at 11:07 AM Jakub Kicinski <[email protected]> wrote:
> > > > > > > Sure, that's simplest. I wasn't sure something is supposed to prevent
> > > > > > > this condition or if it's okay to cover it up.
> > > > > >
> > > > > > I'm pretty sure it is okay to cover it up. In this case the "budget -
> > > > > > 1" is supposed to be the upper limit on what can be reported. I think
> > > > > > it was assuming an unsigned value anyway.
> > > > > >
> > > > > > Another alternative would be to default clean_complete to !!budget.
> > > > > > Then if budget is 0 clean_complete would always return false.
> > > > >
> > > > > So, among all the variants, which one to try? Or there was a separate
> > > > > patch sent to address this?
> > > >
> > > > Alex's suggestion is probably best.
> > > >
> > > > I'm not aware of the fix being posted. Perhaps you could take over and
> > > > post the patch if Intel doesn't chime in?
> > >
> > > So, IIUC, Alex suggests this:
> > >
> > > ```
> > > diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
> > > index a45cd2b416c8..7503d5bf168a 100644
> > > --- a/drivers/net/ethernet/intel/igb/igb_main.c
> > > +++ b/drivers/net/ethernet/intel/igb/igb_main.c
> > > @@ -7981,7 +7981,7 @@ static int igb_poll(struct napi_struct *napi, int budget)
> > > struct igb_q_vector,
> > > napi);
> > > bool clean_complete = true;
> > > - int work_done = 0;
> > > + unsigned int work_done = 0;
> > >
> > > #ifdef CONFIG_IGB_DCA
> > > if (q_vector->adapter->flags & IGB_FLAG_DCA_ENABLED)
> > > @@ -8008,7 +8008,7 @@ static int igb_poll(struct napi_struct *napi, int budget)
> > > if (likely(napi_complete_done(napi, work_done)))
> > > igb_ring_irq_enable(q_vector);
> > >
> > > - return min(work_done, budget - 1);
> > > + return min_t(unsigned int, work_done, budget - 1);
> > > }
> > >
> > > /**
> > > ```
> > >
> > > Am I right?
> > >
> > > Thanks.
> >
> > Actually a better way to go would be to probably just initialize
> > "clean_complete = !!budget". With that we don't have it messing with
> > the interrupt enables which would probably be a better behavior.
>
>
> Thanks guys for the suggestions here! Finally got some time for
> this, so here is the patch I'm going to queue shortly.
>
> From ffd24e90d688ee347ab051266bfc7fca00324a68 Mon Sep 17 00:00:00 2001
> From: Jesse Brandeburg <[email protected]>
> Date: Thu, 6 May 2021 14:41:11 -0700
> Subject: [PATCH net] igb: fix netpoll exit with traffic
> To: netdev,
> Oleksandr Natalenko <[email protected]>
> Cc: Jakub Kicinski <[email protected]>, LKML <[email protected]>, "Brandeburg, Jesse" <[email protected]>, "Nguyen, Anthony L" <[email protected]>, "David S. Miller" <[email protected]>, intel-wired-lan <[email protected]>, Alexander Duyck <[email protected]>
>
> Oleksandr brought a bug report where netpoll causes trace messages in
> the log on igb.
>
> [22038.710800] ------------[ cut here ]------------
> [22038.710801] igb_poll+0x0/0x1440 [igb] exceeded budget in poll
> [22038.710802] WARNING: CPU: 12 PID: 40362 at net/core/netpoll.c:155 netpoll_poll_dev+0x18a/0x1a0
>
> After some discussion and debug from the list, it was deemed that the
> right thing to do is initialize the clean_complete variable to false
> when the "netpoll mode" of passing a zero budget is used.
>
> This logic should be sane and not risky because the only time budget
> should be zero on entry is netpoll. Change includes a small refactor
> of local variable assignments to clean up the look.
>
> Fixes: 16eb8815c235 ("igb: Refactor clean_rx_irq to reduce overhead and improve performance")
> Reported-by: Oleksandr Natalenko <[email protected]>
> Suggested-by: Alexander Duyck <[email protected]>
> Signed-off-by: Jesse Brandeburg <[email protected]>
> ---
>
> Compile tested ONLY, but functionally it should be exactly the same for
> all cases except when budget is zero on entry, which will hopefully fix
> the bug.
> ---
> drivers/net/ethernet/intel/igb/igb_main.c | 12 ++++++++----
> 1 file changed, 8 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
> index 0cd37ad81b4e..b0a9bed14071 100644
> --- a/drivers/net/ethernet/intel/igb/igb_main.c
> +++ b/drivers/net/ethernet/intel/igb/igb_main.c
> @@ -7991,12 +7991,16 @@ static void igb_ring_irq_enable(struct igb_q_vector *q_vector)
> **/
> static int igb_poll(struct napi_struct *napi, int budget)
> {
> - struct igb_q_vector *q_vector = container_of(napi,
> - struct igb_q_vector,
> - napi);
> - bool clean_complete = true;
> + struct igb_q_vector *q_vector;
> + bool clean_complete;
> int work_done = 0;
>
> + /* if budget is zero, we have a special case for netconsole, so
> + * make sure to set clean_complete to false in that case.
> + */
> + clean_complete = !!budget;
> +
> + q_vector = container_of(napi, struct igb_q_vector, napi);
> #ifdef CONFIG_IGB_DCA
> if (q_vector->adapter->flags & IGB_FLAG_DCA_ENABLED)
> igb_update_dca(q_vector);

I'm not a big fan of moving the q_vector init as a part of this patch
since it just means more backport work.

That said the change itself should be harmless so I am good with it either way.

Reviewed-by: Alexander Duyck <[email protected]>

2021-11-18 15:30:41

by Danielle Ratson

[permalink] [raw]
Subject: Re: [igb] netconsole triggers warning in netpoll_poll_dev

> On Thu, May 6, 2021 at 4:32 PM Jesse Brandeburg
> <[email protected]> wrote:
> >
> > Alexander Duyck wrote:
> >
> > > On Sun, Apr 25, 2021 at 11:47 PM Oleksandr Natalenko
> > > <[email protected]> wrote:
> > > >
> > > > Hello.
> > > >
> > > > On Fri, Apr 23, 2021 at 03:58:36PM -0700, Jakub Kicinski wrote:
> > > > > On Fri, 23 Apr 2021 10:19:44 +0200 Oleksandr Natalenko wrote:
> > > > > > On Wed, Apr 07, 2021 at 04:06:29PM -0700, Alexander Duyck wrote:
> > > > > > > On Wed, Apr 7, 2021 at 11:07 AM Jakub Kicinski
> <[email protected]> wrote:
> > > > > > > > Sure, that's simplest. I wasn't sure something is supposed
> > > > > > > > to prevent this condition or if it's okay to cover it up.
> > > > > > >
> > > > > > > I'm pretty sure it is okay to cover it up. In this case the
> > > > > > > "budget - 1" is supposed to be the upper limit on what can
> > > > > > > be reported. I think it was assuming an unsigned value anyway.
> > > > > > >
> > > > > > > Another alternative would be to default clean_complete to
> !!budget.
> > > > > > > Then if budget is 0 clean_complete would always return false.
> > > > > >
> > > > > > So, among all the variants, which one to try? Or there was a
> > > > > > separate patch sent to address this?
> > > > >
> > > > > Alex's suggestion is probably best.
> > > > >
> > > > > I'm not aware of the fix being posted. Perhaps you could take
> > > > > over and post the patch if Intel doesn't chime in?
> > > >
> > > > So, IIUC, Alex suggests this:
> > > >
> > > > ```
> > > > diff --git a/drivers/net/ethernet/intel/igb/igb_main.c
> > > > b/drivers/net/ethernet/intel/igb/igb_main.c
> > > > index a45cd2b416c8..7503d5bf168a 100644
> > > > --- a/drivers/net/ethernet/intel/igb/igb_main.c
> > > > +++ b/drivers/net/ethernet/intel/igb/igb_main.c
> > > > @@ -7981,7 +7981,7 @@ static int igb_poll(struct napi_struct *napi, int
> budget)
> > > > struct igb_q_vector,
> > > > napi);
> > > > bool clean_complete = true;
> > > > - int work_done = 0;
> > > > + unsigned int work_done = 0;
> > > >
> > > > #ifdef CONFIG_IGB_DCA
> > > > if (q_vector->adapter->flags & IGB_FLAG_DCA_ENABLED) @@
> > > > -8008,7 +8008,7 @@ static int igb_poll(struct napi_struct *napi, int
> budget)
> > > > if (likely(napi_complete_done(napi, work_done)))
> > > > igb_ring_irq_enable(q_vector);
> > > >
> > > > - return min(work_done, budget - 1);
> > > > + return min_t(unsigned int, work_done, budget - 1);
> > > > }
> > > >
> > > > /**
> > > > ```
> > > >
> > > > Am I right?
> > > >
> > > > Thanks.
> > >
> > > Actually a better way to go would be to probably just initialize
> > > "clean_complete = !!budget". With that we don't have it messing with
> > > the interrupt enables which would probably be a better behavior.
> >
> >
> > Thanks guys for the suggestions here! Finally got some time for this,
> > so here is the patch I'm going to queue shortly.
> >
> > From ffd24e90d688ee347ab051266bfc7fca00324a68 Mon Sep 17 00:00:00
> 2001
> > From: Jesse Brandeburg <[email protected]>
> > Date: Thu, 6 May 2021 14:41:11 -0700
> > Subject: [PATCH net] igb: fix netpoll exit with traffic
> > To: netdev,
> > Oleksandr Natalenko <[email protected]>
> > Cc: Jakub Kicinski <[email protected]>, LKML
> > <[email protected]>, "Brandeburg, Jesse"
> > <[email protected]>, "Nguyen, Anthony L"
> > <[email protected]>, "David S. Miller"
> <[email protected]>,
> > intel-wired-lan <[email protected]>, Alexander Duyck
> > <[email protected]>
> >
> > Oleksandr brought a bug report where netpoll causes trace messages in
> > the log on igb.
> >
> > [22038.710800] ------------[ cut here ]------------ [22038.710801]
> > igb_poll+0x0/0x1440 [igb] exceeded budget in poll [22038.710802]
> > WARNING: CPU: 12 PID: 40362 at net/core/netpoll.c:155
> > netpoll_poll_dev+0x18a/0x1a0
> >
> > After some discussion and debug from the list, it was deemed that the
> > right thing to do is initialize the clean_complete variable to false
> > when the "netpoll mode" of passing a zero budget is used.
> >
> > This logic should be sane and not risky because the only time budget
> > should be zero on entry is netpoll. Change includes a small refactor
> > of local variable assignments to clean up the look.
> >
> > Fixes: 16eb8815c235 ("igb: Refactor clean_rx_irq to reduce overhead
> > and improve performance")
> > Reported-by: Oleksandr Natalenko <[email protected]>
> > Suggested-by: Alexander Duyck <[email protected]>
> > Signed-off-by: Jesse Brandeburg <[email protected]>
> > ---
> >
> > Compile tested ONLY, but functionally it should be exactly the same
> > for all cases except when budget is zero on entry, which will
> > hopefully fix the bug.
> > ---
> > drivers/net/ethernet/intel/igb/igb_main.c | 12 ++++++++----
> > 1 file changed, 8 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/net/ethernet/intel/igb/igb_main.c
> > b/drivers/net/ethernet/intel/igb/igb_main.c
> > index 0cd37ad81b4e..b0a9bed14071 100644
> > --- a/drivers/net/ethernet/intel/igb/igb_main.c
> > +++ b/drivers/net/ethernet/intel/igb/igb_main.c
> > @@ -7991,12 +7991,16 @@ static void igb_ring_irq_enable(struct
> igb_q_vector *q_vector)
> > **/
> > static int igb_poll(struct napi_struct *napi, int budget) {
> > - struct igb_q_vector *q_vector = container_of(napi,
> > - struct igb_q_vector,
> > - napi);
> > - bool clean_complete = true;
> > + struct igb_q_vector *q_vector;
> > + bool clean_complete;
> > int work_done = 0;
> >
> > + /* if budget is zero, we have a special case for netconsole, so
> > + * make sure to set clean_complete to false in that case.
> > + */
> > + clean_complete = !!budget;
> > +
> > + q_vector = container_of(napi, struct igb_q_vector, napi);
> > #ifdef CONFIG_IGB_DCA
> > if (q_vector->adapter->flags & IGB_FLAG_DCA_ENABLED)
> > igb_update_dca(q_vector);
>
> I'm not a big fan of moving the q_vector init as a part of this patch since it
> just means more backport work.
>
> That said the change itself should be harmless so I am good with it either
> way.
>
> Reviewed-by: Alexander Duyck <[email protected]>

Hi,

I have lately added the netconsole module, and since then we see the same warning constantly in the logs.
I have tried to apply Jesse's patch but it didn't seem to solve the issue.

Did anyone managed to solve the issue and can share with us?

Thanks,
Danielle

2021-11-18 16:14:38

by Alexander Duyck

[permalink] [raw]
Subject: Re: [igb] netconsole triggers warning in netpoll_poll_dev

On Thu, Nov 18, 2021 at 7:30 AM Danielle Ratson <[email protected]> wrote:
>
> > On Thu, May 6, 2021 at 4:32 PM Jesse Brandeburg
> > <[email protected]> wrote:
> > >
> > > Alexander Duyck wrote:
> > >
> > > > On Sun, Apr 25, 2021 at 11:47 PM Oleksandr Natalenko
> > > > <[email protected]> wrote:
> > > > >
> > > > > Hello.
> > > > >
> > > > > On Fri, Apr 23, 2021 at 03:58:36PM -0700, Jakub Kicinski wrote:
> > > > > > On Fri, 23 Apr 2021 10:19:44 +0200 Oleksandr Natalenko wrote:
> > > > > > > On Wed, Apr 07, 2021 at 04:06:29PM -0700, Alexander Duyck wrote:
> > > > > > > > On Wed, Apr 7, 2021 at 11:07 AM Jakub Kicinski
> > <[email protected]> wrote:
> > > > > > > > > Sure, that's simplest. I wasn't sure something is supposed
> > > > > > > > > to prevent this condition or if it's okay to cover it up.
> > > > > > > >
> > > > > > > > I'm pretty sure it is okay to cover it up. In this case the
> > > > > > > > "budget - 1" is supposed to be the upper limit on what can
> > > > > > > > be reported. I think it was assuming an unsigned value anyway.
> > > > > > > >
> > > > > > > > Another alternative would be to default clean_complete to
> > !!budget.
> > > > > > > > Then if budget is 0 clean_complete would always return false.
> > > > > > >
> > > > > > > So, among all the variants, which one to try? Or there was a
> > > > > > > separate patch sent to address this?
> > > > > >
> > > > > > Alex's suggestion is probably best.
> > > > > >
> > > > > > I'm not aware of the fix being posted. Perhaps you could take
> > > > > > over and post the patch if Intel doesn't chime in?
> > > > >
> > > > > So, IIUC, Alex suggests this:
> > > > >
> > > > > ```
> > > > > diff --git a/drivers/net/ethernet/intel/igb/igb_main.c
> > > > > b/drivers/net/ethernet/intel/igb/igb_main.c
> > > > > index a45cd2b416c8..7503d5bf168a 100644
> > > > > --- a/drivers/net/ethernet/intel/igb/igb_main.c
> > > > > +++ b/drivers/net/ethernet/intel/igb/igb_main.c
> > > > > @@ -7981,7 +7981,7 @@ static int igb_poll(struct napi_struct *napi, int
> > budget)
> > > > > struct igb_q_vector,
> > > > > napi);
> > > > > bool clean_complete = true;
> > > > > - int work_done = 0;
> > > > > + unsigned int work_done = 0;
> > > > >
> > > > > #ifdef CONFIG_IGB_DCA
> > > > > if (q_vector->adapter->flags & IGB_FLAG_DCA_ENABLED) @@
> > > > > -8008,7 +8008,7 @@ static int igb_poll(struct napi_struct *napi, int
> > budget)
> > > > > if (likely(napi_complete_done(napi, work_done)))
> > > > > igb_ring_irq_enable(q_vector);
> > > > >
> > > > > - return min(work_done, budget - 1);
> > > > > + return min_t(unsigned int, work_done, budget - 1);
> > > > > }
> > > > >
> > > > > /**
> > > > > ```
> > > > >
> > > > > Am I right?
> > > > >
> > > > > Thanks.
> > > >
> > > > Actually a better way to go would be to probably just initialize
> > > > "clean_complete = !!budget". With that we don't have it messing with
> > > > the interrupt enables which would probably be a better behavior.
> > >
> > >
> > > Thanks guys for the suggestions here! Finally got some time for this,
> > > so here is the patch I'm going to queue shortly.
> > >
> > > From ffd24e90d688ee347ab051266bfc7fca00324a68 Mon Sep 17 00:00:00
> > 2001
> > > From: Jesse Brandeburg <[email protected]>
> > > Date: Thu, 6 May 2021 14:41:11 -0700
> > > Subject: [PATCH net] igb: fix netpoll exit with traffic
> > > To: netdev,
> > > Oleksandr Natalenko <[email protected]>
> > > Cc: Jakub Kicinski <[email protected]>, LKML
> > > <[email protected]>, "Brandeburg, Jesse"
> > > <[email protected]>, "Nguyen, Anthony L"
> > > <[email protected]>, "David S. Miller"
> > <[email protected]>,
> > > intel-wired-lan <[email protected]>, Alexander Duyck
> > > <[email protected]>
> > >
> > > Oleksandr brought a bug report where netpoll causes trace messages in
> > > the log on igb.
> > >
> > > [22038.710800] ------------[ cut here ]------------ [22038.710801]
> > > igb_poll+0x0/0x1440 [igb] exceeded budget in poll [22038.710802]
> > > WARNING: CPU: 12 PID: 40362 at net/core/netpoll.c:155
> > > netpoll_poll_dev+0x18a/0x1a0
> > >
> > > After some discussion and debug from the list, it was deemed that the
> > > right thing to do is initialize the clean_complete variable to false
> > > when the "netpoll mode" of passing a zero budget is used.
> > >
> > > This logic should be sane and not risky because the only time budget
> > > should be zero on entry is netpoll. Change includes a small refactor
> > > of local variable assignments to clean up the look.
> > >
> > > Fixes: 16eb8815c235 ("igb: Refactor clean_rx_irq to reduce overhead
> > > and improve performance")
> > > Reported-by: Oleksandr Natalenko <[email protected]>
> > > Suggested-by: Alexander Duyck <[email protected]>
> > > Signed-off-by: Jesse Brandeburg <[email protected]>
> > > ---
> > >
> > > Compile tested ONLY, but functionally it should be exactly the same
> > > for all cases except when budget is zero on entry, which will
> > > hopefully fix the bug.
> > > ---
> > > drivers/net/ethernet/intel/igb/igb_main.c | 12 ++++++++----
> > > 1 file changed, 8 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/drivers/net/ethernet/intel/igb/igb_main.c
> > > b/drivers/net/ethernet/intel/igb/igb_main.c
> > > index 0cd37ad81b4e..b0a9bed14071 100644
> > > --- a/drivers/net/ethernet/intel/igb/igb_main.c
> > > +++ b/drivers/net/ethernet/intel/igb/igb_main.c
> > > @@ -7991,12 +7991,16 @@ static void igb_ring_irq_enable(struct
> > igb_q_vector *q_vector)
> > > **/
> > > static int igb_poll(struct napi_struct *napi, int budget) {
> > > - struct igb_q_vector *q_vector = container_of(napi,
> > > - struct igb_q_vector,
> > > - napi);
> > > - bool clean_complete = true;
> > > + struct igb_q_vector *q_vector;
> > > + bool clean_complete;
> > > int work_done = 0;
> > >
> > > + /* if budget is zero, we have a special case for netconsole, so
> > > + * make sure to set clean_complete to false in that case.
> > > + */
> > > + clean_complete = !!budget;
> > > +
> > > + q_vector = container_of(napi, struct igb_q_vector, napi);
> > > #ifdef CONFIG_IGB_DCA
> > > if (q_vector->adapter->flags & IGB_FLAG_DCA_ENABLED)
> > > igb_update_dca(q_vector);
> >
> > I'm not a big fan of moving the q_vector init as a part of this patch since it
> > just means more backport work.
> >
> > That said the change itself should be harmless so I am good with it either
> > way.
> >
> > Reviewed-by: Alexander Duyck <[email protected]>
>
> Hi,
>
> I have lately added the netconsole module, and since then we see the same warning constantly in the logs.
> I have tried to apply Jesse's patch but it didn't seem to solve the issue.
>
> Did anyone managed to solve the issue and can share with us?
>
> Thanks,
> Danielle

The one issue I can see is that it basically leaves the igb_poll call
stuck in polling mode.

The easiest fix for all of this in the in-kernel driver is to just get
rid of the "min" at the end and instead just "return work_done;". The
extra complication is only needed if you were to be polling multiple
queues and that isn't the case here so we should simplify it and get
rid of the buggy "budget - 1" return value.

2021-11-21 11:44:58

by Danielle Ratson

[permalink] [raw]
Subject: RE: [igb] netconsole triggers warning in netpoll_poll_dev

> > > > drivers/net/ethernet/intel/igb/igb_main.c | 12 ++++++++----
> > > > 1 file changed, 8 insertions(+), 4 deletions(-)
> > > >
> > > > diff --git a/drivers/net/ethernet/intel/igb/igb_main.c
> > > > b/drivers/net/ethernet/intel/igb/igb_main.c
> > > > index 0cd37ad81b4e..b0a9bed14071 100644
> > > > --- a/drivers/net/ethernet/intel/igb/igb_main.c
> > > > +++ b/drivers/net/ethernet/intel/igb/igb_main.c
> > > > @@ -7991,12 +7991,16 @@ static void igb_ring_irq_enable(struct
> > > igb_q_vector *q_vector)
> > > > **/
> > > > static int igb_poll(struct napi_struct *napi, int budget) {
> > > > - struct igb_q_vector *q_vector = container_of(napi,
> > > > - struct igb_q_vector,
> > > > - napi);
> > > > - bool clean_complete = true;
> > > > + struct igb_q_vector *q_vector;
> > > > + bool clean_complete;
> > > > int work_done = 0;
> > > >
> > > > + /* if budget is zero, we have a special case for netconsole, so
> > > > + * make sure to set clean_complete to false in that case.
> > > > + */
> > > > + clean_complete = !!budget;
> > > > +
> > > > + q_vector = container_of(napi, struct igb_q_vector, napi);
> > > > #ifdef CONFIG_IGB_DCA
> > > > if (q_vector->adapter->flags & IGB_FLAG_DCA_ENABLED)
> > > > igb_update_dca(q_vector);
> > >
> > > I'm not a big fan of moving the q_vector init as a part of this
> > > patch since it just means more backport work.
> > >
> > > That said the change itself should be harmless so I am good with it
> > > either way.
> > >
> > > Reviewed-by: Alexander Duyck <[email protected]>
> >
> > Hi,
> >
> > I have lately added the netconsole module, and since then we see the
> same warning constantly in the logs.
> > I have tried to apply Jesse's patch but it didn't seem to solve the issue.
> >
> > Did anyone managed to solve the issue and can share with us?
> >
> > Thanks,
> > Danielle
>
> The one issue I can see is that it basically leaves the igb_poll call stuck in
> polling mode.
>
> The easiest fix for all of this in the in-kernel driver is to just get rid of the
> "min" at the end and instead just "return work_done;". The extra
> complication is only needed if you were to be polling multiple queues and
> that isn't the case here so we should simplify it and get rid of the buggy
> "budget - 1" return value.

Thank you very much for your reply Alexander!
It seems to work well!

Are you planning to send it upstream?

Thanks,
Danielle

2021-11-21 21:20:48

by Alexander Duyck

[permalink] [raw]
Subject: Re: [igb] netconsole triggers warning in netpoll_poll_dev

On Sun, Nov 21, 2021 at 3:44 AM Danielle Ratson <[email protected]> wrote:
>
> > > > > drivers/net/ethernet/intel/igb/igb_main.c | 12 ++++++++----
> > > > > 1 file changed, 8 insertions(+), 4 deletions(-)
> > > > >
> > > > > diff --git a/drivers/net/ethernet/intel/igb/igb_main.c
> > > > > b/drivers/net/ethernet/intel/igb/igb_main.c
> > > > > index 0cd37ad81b4e..b0a9bed14071 100644
> > > > > --- a/drivers/net/ethernet/intel/igb/igb_main.c
> > > > > +++ b/drivers/net/ethernet/intel/igb/igb_main.c
> > > > > @@ -7991,12 +7991,16 @@ static void igb_ring_irq_enable(struct
> > > > igb_q_vector *q_vector)
> > > > > **/
> > > > > static int igb_poll(struct napi_struct *napi, int budget) {
> > > > > - struct igb_q_vector *q_vector = container_of(napi,
> > > > > - struct igb_q_vector,
> > > > > - napi);
> > > > > - bool clean_complete = true;
> > > > > + struct igb_q_vector *q_vector;
> > > > > + bool clean_complete;
> > > > > int work_done = 0;
> > > > >
> > > > > + /* if budget is zero, we have a special case for netconsole, so
> > > > > + * make sure to set clean_complete to false in that case.
> > > > > + */
> > > > > + clean_complete = !!budget;
> > > > > +
> > > > > + q_vector = container_of(napi, struct igb_q_vector, napi);
> > > > > #ifdef CONFIG_IGB_DCA
> > > > > if (q_vector->adapter->flags & IGB_FLAG_DCA_ENABLED)
> > > > > igb_update_dca(q_vector);
> > > >
> > > > I'm not a big fan of moving the q_vector init as a part of this
> > > > patch since it just means more backport work.
> > > >
> > > > That said the change itself should be harmless so I am good with it
> > > > either way.
> > > >
> > > > Reviewed-by: Alexander Duyck <[email protected]>
> > >
> > > Hi,
> > >
> > > I have lately added the netconsole module, and since then we see the
> > same warning constantly in the logs.
> > > I have tried to apply Jesse's patch but it didn't seem to solve the issue.
> > >
> > > Did anyone managed to solve the issue and can share with us?
> > >
> > > Thanks,
> > > Danielle
> >
> > The one issue I can see is that it basically leaves the igb_poll call stuck in
> > polling mode.
> >
> > The easiest fix for all of this in the in-kernel driver is to just get rid of the
> > "min" at the end and instead just "return work_done;". The extra
> > complication is only needed if you were to be polling multiple queues and
> > that isn't the case here so we should simplify it and get rid of the buggy
> > "budget - 1" return value.
>
> Thank you very much for your reply Alexander!
> It seems to work well!
>
> Are you planning to send it upstream?

No, I was just suggesting it as a possible solution. Feel free to put
together your own patch and submit it if it is working for you.

- Alex

2021-11-23 20:37:51

by Jesse Brandeburg

[permalink] [raw]
Subject: Re: [igb] netconsole triggers warning in netpoll_poll_dev

On 11/21/2021 1:16 PM, Alexander Duyck wrote:
>>> The easiest fix for all of this in the in-kernel driver is to just get rid of the
>>> "min" at the end and instead just "return work_done;". The extra
>>> complication is only needed if you were to be polling multiple queues and
>>> that isn't the case here so we should simplify it and get rid of the buggy
>>> "budget - 1" return value.
>>
>> Thank you very much for your reply Alexander!
>> It seems to work well!
>>
>> Are you planning to send it upstream?
>
> No, I was just suggesting it as a possible solution. Feel free to put
> together your own patch and submit it if it is working for you.

Thanks for your feedback and work on this, I've prepared a patch and
will send to netdev and intel-wired-lan shortly.

I don't have an easy way to test this one, please let me know what you
think.