2022-12-13 23:24:35

by Maksim Kiselev

[permalink] [raw]
Subject: Subject: Locking mv88e6xxx_reg_lock twice leads deadlock for 88E6176 switch

Subject: Locking mv88e6xxx_reg_lock twice leads deadlock for 88E6176 switch

Hello, friends.

I have a device with Marvell 88E6176 switch.
After 'mv88e6xxx: fix speed setting for CPU/DSA ports (cc1049ccee20)'commit was applied to
mainline kernel I faced with a problem that switch driver stuck at 'mv88e6xxx_probe' function.

I made some investigations and found that 'mv88e6xxx_reg_lock' called twice from the same thread which leads to deadlock.

I added logs to 'mv88e6xxx_reg_lock' and 'mv88e6xxx_reg_unlock' functions to see what happened.

So, first lock called from mv88e6xxx_setup:

static int mv88e6xxx_setup(struct dsa_switch *ds)
{
...

if (mv88e6xxx_has_pvt(chip))
ds->max_num_bridges = MV88E6XXX_MAX_PVT_SWITCHES -
ds->dst->last_switch - 1;

mv88e6xxx_reg_lock(chip);

And second lock called from mv88e6352_phylink_get_caps (only for port 4):
static void mv88e6352_phylink_get_caps(struct mv88e6xxx_chip *chip, int port,
struct phylink_config *config)
{
...

config->mac_capabilities = MAC_SYM_PAUSE | MAC_10 | MAC_100 |
MAC_1000FD;

/* Port 4 supports automedia if the serdes is associated with it. */
if (port == 4) {
dump_stack(); // I added this to see where we came from
mv88e6xxx_reg_lock(chip);


Here is kernel log with my debug output and call stack right before secondary lock:

[ 5.801203] mv88e6085 mdio@2d24000:0c: switch 0x1760 detected: Marvell 88E6176, revision 1
[ 5.950236] >>>>> (mv88e6xxx_reg_lock|797) from mv88e6xxx_mdio_read+0x54/0xfc [mv88e6xxx] thread_id: 287
[ 5.952884] >>>>> (mv88e6xxx_reg_unlock|804) from mv88e6xxx_mdio_read+0x88/0xfc [mv88e6xxx] thread_id: 287
[ 5.969373] >>>>> (mv88e6xxx_reg_lock|797) from mv88e6xxx_setup+0x5c/0x550 [mv88e6xxx] thread_id: 287
[ 6.023785] >>>>> (mv88e6xxx_reg_lock|797) from mv88e6xxx_g1_irq_thread_work+0x28/0x124 [mv88e6xxx] thread_id: 315

[ 6.069185] Backtrace:
[ 6.069206] dump_backtrace from show_stack+0x18/0x1c
[ 6.069266] r7:f1221aa4 r6:00000004 r5:600b0013 r4:c08d684d
[ 6.069279] show_stack from dump_stack_lvl+0x48/0x54
[ 6.069315] dump_stack_lvl from dump_stack+0x14/0x1c
[ 6.069351] r5:f1221ab4 r4:c1e40040
[ 6.069363] dump_stack from mv88e6352_phylink_get_caps+0x58/0x164 [mv88e6xxx]
[ 6.069645] mv88e6352_phylink_get_caps [mv88e6xxx] from mv88e6xxx_get_caps+0x2c/0x4c [mv88e6xxx]
[ 6.070106] r7:c1ff31c0 r6:00000004 r5:c1ff31c0 r4:f1221aa4
[ 6.070118] mv88e6xxx_get_caps [mv88e6xxx] from mv88e6xxx_setup_port+0x84/0x60c [mv88e6xxx]
[ 6.070575] r7:c1ff31c0 r6:00000004 r5:c1e40040 r4:bf16e510
[ 6.070588] mv88e6xxx_setup_port [mv88e6xxx] from mv88e6xxx_setup+0x3e0/0x550 [mv88e6xxx]
[ 6.071048] r9:c1083600 r8:c1e40e04 r7:00000004 r6:c1ff31c0 r5:c1e40040 r4:bf16e510
[ 6.071062] mv88e6xxx_setup [mv88e6xxx] from dsa_register_switch+0x738/0xba8 [dsa_core]
[ 6.071451] r8:c23395c8 r7:c1ff31c0 r6:c23395c0 r5:c1ff31c0 r4:00000000
[ 6.071464] dsa_register_switch [dsa_core] from mv88e6xxx_probe+0x640/0x6a8 [mv88e6xxx]
[ 6.071835] r10:c1164000 r9:bf171a2c r8:ef7f00f8 r7:00000000 r6:00000000 r5:c1e40040
[ 6.071851] r4:c13ea000
[ 6.071862] mv88e6xxx_probe [mv88e6xxx] from mdio_probe+0x34/0x50
[ 6.072133] r10:c1164000 r9:c1e6a5b8 r8:bf173c4c r7:00000000 r6:bf173000 r5:c13ea000
[ 6.072149] r4:bf173000
[ 6.072160] mdio_probe from really_probe+0x14c/0x2b8
[ 6.072206] r5:c13ea000 r4:00000000
[ 6.072218] really_probe from __driver_probe_device+0xcc/0xe0
[ 6.072260] r7:00000039 r6:c13ea000 r5:bf173000 r4:c13ea000
[ 6.072273] __driver_probe_device from driver_probe_device+0x40/0xbc
[ 6.072312] r5:c13ea000 r4:c0cbdbc0
[ 6.072324] driver_probe_device from __driver_attach+0xf0/0x104
[ 6.072367] r7:c0c7c790 r6:bf173000 r5:c13ea000 r4:00000000
[ 6.072380] __driver_attach from bus_for_each_dev+0x70/0xb4
[ 6.072434] r7:c0c7c790 r6:c04e87e8 r5:bf173000 r4:c13ea000
[ 6.072446] bus_for_each_dev from driver_attach+0x20/0x28
[ 6.072507] r6:00000000 r5:c1e6a580 r4:bf173000
[ 6.072520] driver_attach from bus_add_driver+0xbc/0x1cc
[ 6.072574] bus_add_driver from driver_register+0xb4/0xfc
[ 6.072633] r9:bf102000 r8:bf173c4c r7:f1221ea0 r6:c1ff8800 r5:bf173c40 r4:bf173000
[ 6.072647] driver_register from mdio_driver_register+0x34/0x68
[ 6.072696] r5:bf173c40 r4:bf173000
[ 6.072708] mdio_driver_register from mdio_module_init+0x14/0x1000 [mv88e6xxx]
[ 6.072976] r5:bf173c40 r4:c0c88000
[ 6.072988] mdio_module_init [mv88e6xxx] from do_one_initcall+0x6c/0x1d8
[ 6.073239] do_one_initcall from do_init_module+0x48/0x1c4
[ 6.073292] r10:0000000c r9:00000000 r8:bf173c4c r7:f1221ea0 r6:c1ff8800 r5:bf173c40
[ 6.073308] r4:bf173c40
[ 6.073319] do_init_module from load_module+0x1558/0x1698
[ 6.073362] r6:00000000 r5:bf173c40 r4:f1221f28
[ 6.073374] load_module from sys_finit_module+0xdc/0xec
[ 6.073423] r10:0000017b r9:c1164000 r8:c020029c r7:0000017b r6:0000000f r5:b6edbee8
[ 6.073439] r4:00000000
[ 6.073450] sys_finit_module from ret_fast_syscall+0x0/0x4c
[ 6.073485] Exception stack(0xf1221fa8 to 0xf1221ff0)
[ 6.073515] 1fa0: 00020000 00000000 0000000f b6edbee8 00000000 004a3420
[ 6.073544] 1fc0: 00020000 00000000 00000000 0000017b 004a96a8 beb9292c 00000000 004a3420
[ 6.073567] 1fe0: beb92820 beb92810 b6ed5294 b6b98c30
[ 6.073589] r6:00000000 r5:00000000 r4:00020000

[ 6.073658] >>>>> (mv88e6xxx_reg_lock|797) from mv88e6352_phylink_get_caps+0x60/0x164 [mv88e6xxx] thread_id: 287


2022-12-14 12:01:39

by Maksim Kiselev

[permalink] [raw]
Subject: Re: Subject: Locking mv88e6xxx_reg_lock twice leads deadlock for 88E6176 switch

Thanks for the quick fix, Vladimir!

ср, 14 дек. 2022 г. в 14:04, Vladimir Oltean <[email protected]>:
>
> On Wed, Dec 14, 2022 at 01:58:55AM +0300, Maksim Kiselev wrote:
> > Hello, friends.
>
> Hello.
>
> > I have a device with Marvell 88E6176 switch.
> > After 'mv88e6xxx: fix speed setting for CPU/DSA ports (cc1049ccee20)'commit was applied to
> > mainline kernel I faced with a problem that switch driver stuck at 'mv88e6xxx_probe' function.
>
> Sorry for that.
>
> > I made some investigations and found that 'mv88e6xxx_reg_lock' called twice from the same thread which leads to deadlock.
> >
> > I added logs to 'mv88e6xxx_reg_lock' and 'mv88e6xxx_reg_unlock' functions to see what happened.
>
> I hope you didn't spend too much time doing that. If you enable CONFIG_PROVE_LOCKING,
> you should automatically get a stack trace with the two threads that
> acquired the mutex leading to the deadlock.
>
> I've sent a patch which solves that issue here:
> https://patchwork.kernel.org/project/netdevbpf/patch/[email protected]/
> I've regression-tested it on 88E6390. Please confirm with a Tested-by:
> tag on that patch that it does resolve the deadlock for 88E6176.
>
> Thanks for reporting!

2022-12-14 12:02:04

by Vladimir Oltean

[permalink] [raw]
Subject: Re: Subject: Locking mv88e6xxx_reg_lock twice leads deadlock for 88E6176 switch

On Wed, Dec 14, 2022 at 01:58:55AM +0300, Maksim Kiselev wrote:
> Hello, friends.

Hello.

> I have a device with Marvell 88E6176 switch.
> After 'mv88e6xxx: fix speed setting for CPU/DSA ports (cc1049ccee20)'commit was applied to
> mainline kernel I faced with a problem that switch driver stuck at 'mv88e6xxx_probe' function.

Sorry for that.

> I made some investigations and found that 'mv88e6xxx_reg_lock' called twice from the same thread which leads to deadlock.
>
> I added logs to 'mv88e6xxx_reg_lock' and 'mv88e6xxx_reg_unlock' functions to see what happened.

I hope you didn't spend too much time doing that. If you enable CONFIG_PROVE_LOCKING,
you should automatically get a stack trace with the two threads that
acquired the mutex leading to the deadlock.

I've sent a patch which solves that issue here:
https://patchwork.kernel.org/project/netdevbpf/patch/[email protected]/
I've regression-tested it on 88E6390. Please confirm with a Tested-by:
tag on that patch that it does resolve the deadlock for 88E6176.

Thanks for reporting!