2023-12-08 09:29:47

by Marek Szyprowski

[permalink] [raw]
Subject: Re: [PATCH net-next] net: rtnetlink: remove local list in __linkwatch_run_queue()

Hi Johannes,

On 05.12.2023 17:00, Johannes Berg wrote:
> From: Johannes Berg <[email protected]>
>
> Due to linkwatch_forget_dev() (and perhaps others?) checking for
> list_empty(&dev->link_watch_list), we must have all manipulations
> of even the local on-stack list 'wrk' here under spinlock, since
> even that list can be reached otherwise via dev->link_watch_list.
>
> This is already the case, but makes this a bit counter-intuitive,
> often local lists are used to _not_ have to use locking for their
> local use.
>
> Remove the local list as it doesn't seem to serve any purpose.
> While at it, move a variable declaration into the loop using it.
>
> Reviewed-by: Jiri Pirko <[email protected]>
> Signed-off-by: Johannes Berg <[email protected]>

This patch landed in today's linux next-20231208 as commit b8dbbbc535a9
("net: rtnetlink: remove local list in __linkwatch_run_queue()").
Unfortunately it breaks booting of many of my ARM 32bit based test
boards. The issue can be also observed on QEMU's ARM 32bit 'virt'
machine. Booting hangs on very early:

...

e1000e: Intel(R) PRO/1000 Network Driver
e1000e: Copyright(c) 1999 - 2015 Intel Corporation.
igb: Intel(R) Gigabit Ethernet Network Driver
igb: Copyright (c) 2007-2014 Intel Corporation.
pegasus: Pegasus/Pegasus II USB Ethernet driver
usbcore: registered new interface driver pegasus
usbcore: registered new interface driver asix
usbcore: registered new interface driver ax88179_178a
usbcore: registered new interface driver cdc_ether
usbcore: registered new interface driver smsc75xx
usbcore: registered new interface driver smsc95xx
usbcore: registered new interface driver net1080
usbcore: registered new interface driver cdc_subset
usbcore: registered new interface driver zaurus
usbcore: registered new interface driver cdc_ncm
usbcore: registered new interface driver usb-storage
rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
rcu:     1-...0: (1 ticks this GP) idle=b7cc/1/0x40000000 softirq=23/23
fqs=1027
rcu:     (detected by 0, t=2103 jiffies, g=-1163, q=8 ncpus=2)
Sending NMI from CPU 0 to CPUs 1:

Reverting $subject on top of current linux-next tree fixes the issue.


You can reproduce this issue with the following commands:

# make ARCH=arm CROSS_COMPILE=arm-linux-gnueabi- multi_v7_defconfig zImage

# qemu-system-arm -nographic -kernel arch/arm/boot/zImage -append
"console=ttyAMA0 no_console_suspend earlycon root=/dev/vda" -M virt -smp
2 -m 512 -netdev user,id=user -device virtio-net-device,netdev=user

(the netdev parameter for qemu is important to trigger the issue, the
lack of rootfs is intentional here, as the hang happens before kernel
searches for rootfs).


> ---
> net/core/link_watch.c | 13 ++++---------
> 1 file changed, 4 insertions(+), 9 deletions(-)
>
> diff --git a/net/core/link_watch.c b/net/core/link_watch.c
> index c469d1c4db5d..ed3e5391fa79 100644
> --- a/net/core/link_watch.c
> +++ b/net/core/link_watch.c
> @@ -192,8 +192,6 @@ static void __linkwatch_run_queue(int urgent_only)
> #define MAX_DO_DEV_PER_LOOP 100
>
> int do_dev = MAX_DO_DEV_PER_LOOP;
> - struct net_device *dev;
> - LIST_HEAD(wrk);
>
> /* Give urgent case more budget */
> if (urgent_only)
> @@ -215,11 +213,11 @@ static void __linkwatch_run_queue(int urgent_only)
> clear_bit(LW_URGENT, &linkwatch_flags);
>
> spin_lock_irq(&lweventlist_lock);
> - list_splice_init(&lweventlist, &wrk);
> + while (!list_empty(&lweventlist) && do_dev > 0) {
> + struct net_device *dev;
>
> - while (!list_empty(&wrk) && do_dev > 0) {
> -
> - dev = list_first_entry(&wrk, struct net_device, link_watch_list);
> + dev = list_first_entry(&lweventlist, struct net_device,
> + link_watch_list);
> list_del_init(&dev->link_watch_list);
>
> if (!netif_device_present(dev) ||
> @@ -237,9 +235,6 @@ static void __linkwatch_run_queue(int urgent_only)
> spin_lock_irq(&lweventlist_lock);
> }
>
> - /* Add the remaining work back to lweventlist */
> - list_splice_init(&wrk, &lweventlist);
> -
> if (!list_empty(&lweventlist))
> linkwatch_schedule_work(0);
> spin_unlock_irq(&lweventlist_lock);

Best regards
--
Marek Szyprowski, PhD
Samsung R&D Institute Poland