2024-02-06 12:37:02

by Horatiu Vultur

[permalink] [raw]
Subject: [PATCH net v2] lan966x: Fix crash when adding interface under a lag

There is a crash when adding one of the lan966x interfaces under a lag
interface. The issue can be reproduced like this:
ip link add name bond0 type bond miimon 100 mode balance-xor
ip link set dev eth0 master bond0

The reason is because when adding a interface under the lag it would go
through all the ports and try to figure out which other ports are under
that lag interface. And the issue is that lan966x can have ports that are
NULL pointer as they are not probed. So then iterating over these ports
it would just crash as they are NULL pointers.
The fix consists in actually checking for NULL pointers before accessing
something from the ports. Like we do in other places.

Fixes: cabc9d49333d ("net: lan966x: Add lag support for lan966x")
Signed-off-by: Horatiu Vultur <[email protected]>
Reviewed-by: Michal Swiatkowski <[email protected]>
---
v1->v2:
- replace lan966x->ports[lag]->bond with port->bond as it is the same
and easier to follow
---
drivers/net/ethernet/microchip/lan966x/lan966x_lag.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_lag.c b/drivers/net/ethernet/microchip/lan966x/lan966x_lag.c
index 41fa2523d91d3..5f2cd9a8cf8fb 100644
--- a/drivers/net/ethernet/microchip/lan966x/lan966x_lag.c
+++ b/drivers/net/ethernet/microchip/lan966x/lan966x_lag.c
@@ -37,19 +37,24 @@ static void lan966x_lag_set_aggr_pgids(struct lan966x *lan966x)

/* Now, set PGIDs for each active LAG */
for (lag = 0; lag < lan966x->num_phys_ports; ++lag) {
- struct net_device *bond = lan966x->ports[lag]->bond;
+ struct lan966x_port *port = lan966x->ports[lag];
int num_active_ports = 0;
+ struct net_device *bond;
unsigned long bond_mask;
u8 aggr_idx[16];

- if (!bond || (visited & BIT(lag)))
+ if (!port || !port->bond || (visited & BIT(lag)))
continue;

+ bond = port->bond;
bond_mask = lan966x_lag_get_mask(lan966x, bond);

for_each_set_bit(p, &bond_mask, lan966x->num_phys_ports) {
struct lan966x_port *port = lan966x->ports[p];

+ if (!port)
+ continue;
+
lan_wr(ANA_PGID_PGID_SET(bond_mask),
lan966x, ANA_PGID(p));
if (port->lag_tx_active)
--
2.34.1



2024-02-06 20:52:33

by Simon Horman

[permalink] [raw]
Subject: Re: [PATCH net v2] lan966x: Fix crash when adding interface under a lag

On Tue, Feb 06, 2024 at 01:30:54PM +0100, Horatiu Vultur wrote:
> There is a crash when adding one of the lan966x interfaces under a lag
> interface. The issue can be reproduced like this:
> ip link add name bond0 type bond miimon 100 mode balance-xor
> ip link set dev eth0 master bond0
>
> The reason is because when adding a interface under the lag it would go
> through all the ports and try to figure out which other ports are under
> that lag interface. And the issue is that lan966x can have ports that are
> NULL pointer as they are not probed. So then iterating over these ports
> it would just crash as they are NULL pointers.
> The fix consists in actually checking for NULL pointers before accessing
> something from the ports. Like we do in other places.
>
> Fixes: cabc9d49333d ("net: lan966x: Add lag support for lan966x")
> Signed-off-by: Horatiu Vultur <[email protected]>
> Reviewed-by: Michal Swiatkowski <[email protected]>

Reviewed-by: Simon Horman <[email protected]>

2024-02-09 21:52:52

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [PATCH net v2] lan966x: Fix crash when adding interface under a lag

On Tue, 6 Feb 2024 13:30:54 +0100 Horatiu Vultur wrote:
> for (lag = 0; lag < lan966x->num_phys_ports; ++lag) {
> - struct net_device *bond = lan966x->ports[lag]->bond;
> + struct lan966x_port *port = lan966x->ports[lag];
> int num_active_ports = 0;
> + struct net_device *bond;
> unsigned long bond_mask;
> u8 aggr_idx[16];
>
> - if (!bond || (visited & BIT(lag)))
> + if (!port || !port->bond || (visited & BIT(lag)))
> continue;
>
> + bond = port->bond;
> bond_mask = lan966x_lag_get_mask(lan966x, bond);
>
> for_each_set_bit(p, &bond_mask, lan966x->num_phys_ports) {
> struct lan966x_port *port = lan966x->ports[p];
>
> + if (!port)
> + continue;

Why would lan966x_lag_get_mask() set a bit for a port that doesn't
exist? Earlier check makes sense. This one seems too defensive.
--
pw-bot: cr

2024-02-12 08:11:25

by Horatiu Vultur

[permalink] [raw]
Subject: Re: [PATCH net v2] lan966x: Fix crash when adding interface under a lag

The 02/09/2024 13:52, Jakub Kicinski wrote:

Hi Jakub,

>
> On Tue, 6 Feb 2024 13:30:54 +0100 Horatiu Vultur wrote:
> > for (lag = 0; lag < lan966x->num_phys_ports; ++lag) {
> > - struct net_device *bond = lan966x->ports[lag]->bond;
> > + struct lan966x_port *port = lan966x->ports[lag];
> > int num_active_ports = 0;
> > + struct net_device *bond;
> > unsigned long bond_mask;
> > u8 aggr_idx[16];
> >
> > - if (!bond || (visited & BIT(lag)))
> > + if (!port || !port->bond || (visited & BIT(lag)))
> > continue;
> >
> > + bond = port->bond;
> > bond_mask = lan966x_lag_get_mask(lan966x, bond);
> >
> > for_each_set_bit(p, &bond_mask, lan966x->num_phys_ports) {
> > struct lan966x_port *port = lan966x->ports[p];
> >
> > + if (!port)
> > + continue;
>
> Why would lan966x_lag_get_mask() set a bit for a port that doesn't
> exist? Earlier check makes sense. This one seems too defensive.

You are right, the lan966x_lag_get_mask() will not set a bit for a port
that doesn't exist[1]. Therefore this check is not needed.

[1] https://elixir.bootlin.com/linux/latest/source/drivers/net/ethernet/microchip/lan966x/lan966x_lag.c#L354

> --
> pw-bot: cr

--
/Horatiu

2024-02-12 08:33:18

by Horatiu Vultur

[permalink] [raw]
Subject: Re: [PATCH net v2] lan966x: Fix crash when adding interface under a lag

The 02/12/2024 09:10, Horatiu Vultur wrote:
> The 02/09/2024 13:52, Jakub Kicinski wrote:
>
> Hi Jakub,
>
> >
> > On Tue, 6 Feb 2024 13:30:54 +0100 Horatiu Vultur wrote:
> > > for (lag = 0; lag < lan966x->num_phys_ports; ++lag) {
> > > - struct net_device *bond = lan966x->ports[lag]->bond;
> > > + struct lan966x_port *port = lan966x->ports[lag];
> > > int num_active_ports = 0;
> > > + struct net_device *bond;
> > > unsigned long bond_mask;
> > > u8 aggr_idx[16];
> > >
> > > - if (!bond || (visited & BIT(lag)))
> > > + if (!port || !port->bond || (visited & BIT(lag)))
> > > continue;
> > >
> > > + bond = port->bond;
> > > bond_mask = lan966x_lag_get_mask(lan966x, bond);
> > >
> > > for_each_set_bit(p, &bond_mask, lan966x->num_phys_ports) {
> > > struct lan966x_port *port = lan966x->ports[p];
> > >
> > > + if (!port)
> > > + continue;
> >
> > Why would lan966x_lag_get_mask() set a bit for a port that doesn't
> > exist? Earlier check makes sense. This one seems too defensive.
>
> You are right, the lan966x_lag_get_mask() will not set a bit for a port
> that doesn't exist[1]. Therefore this check is not needed.
>
> [1] https://elixir.bootlin.com/linux/latest/source/drivers/net/ethernet/microchip/lan966x/lan966x_lag.c#L354

While trying to rebase on net, the next version of this patch, I have seen that
actually this version was accepted even though it was marked as "Changes
Requested".
The commit sha is: 15faa1f67ab405d47789d4702f587ec7df7ef03e

How do you prefer to go forward from here?
- do you want to revert this and then I will send a new version?
- should I send a patch that just removes this unneeded check?
- any other suggestion?

>
> > --
> > pw-bot: cr
>
> --
> /Horatiu

--
/Horatiu

2024-02-13 01:21:09

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [PATCH net v2] lan966x: Fix crash when adding interface under a lag

On Mon, 12 Feb 2024 09:32:29 +0100 Horatiu Vultur wrote:
> > You are right, the lan966x_lag_get_mask() will not set a bit for a port
> > that doesn't exist[1]. Therefore this check is not needed.
> >
> > [1] https://elixir.bootlin.com/linux/latest/source/drivers/net/ethernet/microchip/lan966x/lan966x_lag.c#L354
>
> While trying to rebase on net, the next version of this patch, I have seen that
> actually this version was accepted even though it was marked as "Changes
> Requested".
> The commit sha is: 15faa1f67ab405d47789d4702f587ec7df7ef03e
>
> How do you prefer to go forward from here?
> - do you want to revert this and then I will send a new version?
> - should I send a patch that just removes this unneeded check?
> - any other suggestion?

Sorry about that, I must have forgotten to reset the tree after viewing
and didn't spot this between Brenos patches :S No big deal, let's leave
it as is.