2018-12-24 11:57:48

by Ivan Mironov

[permalink] [raw]
Subject: [PATCH v1] bnx2x: Fix NULL pointer dereference in bnx2x_del_all_vlans() on some hw

This happened when I tried to boot normal Fedora 29 system with latest
available kernel (from fedora rawhide, plus some unrelated custom
patches):

BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
PGD 0 P4D 0
Oops: 0010 [#1] SMP PTI
CPU: 6 PID: 1422 Comm: libvirtd Tainted: G I 4.20.0-0.rc7.git3.hpsa2.1.fc29.x86_64 #1
Hardware name: HP ProLiant BL460c G6, BIOS I24 05/21/2018
RIP: 0010: (null)
Code: Bad RIP value.
RSP: 0018:ffffa47ccdc9fbe0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 00000000000003e8 RCX: ffffa47ccdc9fbf8
RDX: ffffa47ccdc9fc00 RSI: ffff97d9ee7b01f8 RDI: ffff97d9f0150b80
RBP: ffff97d9f0150b80 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000003
R13: ffff97d9ef1e53e8 R14: 0000000000000009 R15: ffff97d9f0ac6730
FS: 00007f4d224ef700(0000) GS:ffff97d9fa200000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffffffffd6 CR3: 00000011ece52006 CR4: 00000000000206e0
Call Trace:
? bnx2x_chip_cleanup+0x195/0x610 [bnx2x]
? bnx2x_nic_unload+0x1e2/0x8f0 [bnx2x]
? bnx2x_reload_if_running+0x24/0x40 [bnx2x]
? bnx2x_set_features+0x79/0xa0 [bnx2x]
? __netdev_update_features+0x244/0x9e0
? netlink_broadcast_filtered+0x136/0x4b0
? netdev_update_features+0x22/0x60
? dev_disable_lro+0x1c/0xe0
? devinet_sysctl_forward+0x1c6/0x211
? proc_sys_call_handler+0xab/0x100
? __vfs_write+0x36/0x1a0
? rcu_read_lock_sched_held+0x79/0x80
? rcu_sync_lockdep_assert+0x2e/0x60
? __sb_start_write+0x14c/0x1b0
? vfs_write+0x159/0x1c0
? vfs_write+0xba/0x1c0
? ksys_write+0x52/0xc0
? do_syscall_64+0x60/0x1f0
? entry_SYSCALL_64_after_hwframe+0x49/0xbe

After some investigation I figured out that recently added cleanup code
tries to call VLAN filtering de-initialization function which exist only
for newer hardware. Corresponding function pointer is not
set (== 0) for older hardware, namely these chips:

#define CHIP_NUM_57710 0x164e
#define CHIP_NUM_57711 0x164f
#define CHIP_NUM_57711E 0x1650

And I have one of those in my test system:

Broadcom Inc. and subsidiaries NetXtreme II BCM57711E 10-Gigabit PCIe [14e4:1650]

Function bnx2x_init_vlan_mac_fp_objs() from
drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h decides whether to
initialize relevant pointers in bnx2x_sp_objs.vlan_obj or not.

This regression was introduced after v4.20-rc7, and still exists in v4.20
release.

Fixes: 04f05230c5c13 ("bnx2x: Remove configured vlans as part of unload sequence.")
Signed-off-by: Ivan Mironov <[email protected]>
---
v1:
- Check for chip num instead of (vlan_obj->delete_all != 0).
v0:
- Patch introduced.
---
.../net/ethernet/broadcom/bnx2x/bnx2x_main.c | 22 +++++++++++++------
1 file changed, 15 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
index b164f705709d..4d72e9948a58 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
@@ -8504,15 +8504,23 @@ int bnx2x_set_vlan_one(struct bnx2x *bp, u16 vlan,
static int bnx2x_del_all_vlans(struct bnx2x *bp)
{
struct bnx2x_vlan_mac_obj *vlan_obj = &bp->sp_objs[0].vlan_obj;
- unsigned long ramrod_flags = 0, vlan_flags = 0;
struct bnx2x_vlan_entry *vlan;
- int rc;

- __set_bit(RAMROD_COMP_WAIT, &ramrod_flags);
- __set_bit(BNX2X_VLAN, &vlan_flags);
- rc = vlan_obj->delete_all(bp, vlan_obj, &vlan_flags, &ramrod_flags);
- if (rc)
- return rc;
+ /* The whole *vlan_obj structure may be not initialized if VLAN
+ * filtering offload is not supported by hardware. Currently this is
+ * true for all hardware covered by CHIP_IS_E1x().
+ */
+ if (!CHIP_IS_E1x(bp)) {
+ unsigned long ramrod_flags = 0, vlan_flags = 0;
+ int rc;
+
+ __set_bit(RAMROD_COMP_WAIT, &ramrod_flags);
+ __set_bit(BNX2X_VLAN, &vlan_flags);
+ rc = vlan_obj->delete_all(bp, vlan_obj, &vlan_flags,
+ &ramrod_flags);
+ if (rc)
+ return rc;
+ }

/* Mark that hw forgot all entries */
list_for_each_entry(vlan, &bp->vlan_reg, link)
--
2.20.1



2018-12-24 14:28:55

by Sudarsana Reddy Kalluru

[permalink] [raw]
Subject: RE: [EXT] [PATCH v1] bnx2x: Fix NULL pointer dereference in bnx2x_del_all_vlans() on some hw


-----Original Message-----
From: Ivan Mironov [mailto:[email protected]]
Sent: 24 December 2018 17:26
To: [email protected]
Cc: Ariel Elior <[email protected]>; Sudarsana Kalluru <[email protected]>; [email protected]; David S. Miller <[email protected]>; [email protected]; Ivan Mironov <[email protected]>
Subject: [EXT] [PATCH v1] bnx2x: Fix NULL pointer dereference in bnx2x_del_all_vlans() on some hw

External Email

----------------------------------------------------------------------
External Email

This happened when I tried to boot normal Fedora 29 system with latest available kernel (from fedora rawhide, plus some unrelated custom
patches):

BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
PGD 0 P4D 0
Oops: 0010 [#1] SMP PTI
CPU: 6 PID: 1422 Comm: libvirtd Tainted: G I 4.20.0-0.rc7.git3.hpsa2.1.fc29.x86_64 #1
Hardware name: HP ProLiant BL460c G6, BIOS I24 05/21/2018
RIP: 0010: (null)
Code: Bad RIP value.
RSP: 0018:ffffa47ccdc9fbe0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 00000000000003e8 RCX: ffffa47ccdc9fbf8
RDX: ffffa47ccdc9fc00 RSI: ffff97d9ee7b01f8 RDI: ffff97d9f0150b80
RBP: ffff97d9f0150b80 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000003
R13: ffff97d9ef1e53e8 R14: 0000000000000009 R15: ffff97d9f0ac6730
FS: 00007f4d224ef700(0000) GS:ffff97d9fa200000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffffffffd6 CR3: 00000011ece52006 CR4: 00000000000206e0
Call Trace:
? bnx2x_chip_cleanup+0x195/0x610 [bnx2x]
? bnx2x_nic_unload+0x1e2/0x8f0 [bnx2x]
? bnx2x_reload_if_running+0x24/0x40 [bnx2x]
? bnx2x_set_features+0x79/0xa0 [bnx2x]
? __netdev_update_features+0x244/0x9e0
? netlink_broadcast_filtered+0x136/0x4b0
? netdev_update_features+0x22/0x60
? dev_disable_lro+0x1c/0xe0
? devinet_sysctl_forward+0x1c6/0x211
? proc_sys_call_handler+0xab/0x100
? __vfs_write+0x36/0x1a0
? rcu_read_lock_sched_held+0x79/0x80
? rcu_sync_lockdep_assert+0x2e/0x60
? __sb_start_write+0x14c/0x1b0
? vfs_write+0x159/0x1c0
? vfs_write+0xba/0x1c0
? ksys_write+0x52/0xc0
? do_syscall_64+0x60/0x1f0
? entry_SYSCALL_64_after_hwframe+0x49/0xbe

After some investigation I figured out that recently added cleanup code tries to call VLAN filtering de-initialization function which exist only for newer hardware. Corresponding function pointer is not set (== 0) for older hardware, namely these chips:

#define CHIP_NUM_57710 0x164e
#define CHIP_NUM_57711 0x164f
#define CHIP_NUM_57711E 0x1650

And I have one of those in my test system:

Broadcom Inc. and subsidiaries NetXtreme II BCM57711E 10-Gigabit PCIe [14e4:1650]

Function bnx2x_init_vlan_mac_fp_objs() from drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h decides whether to initialize relevant pointers in bnx2x_sp_objs.vlan_obj or not.

This regression was introduced after v4.20-rc7, and still exists in v4.20 release.

Fixes: 04f05230c5c13 ("bnx2x: Remove configured vlans as part of unload sequence.")
Signed-off-by: Ivan Mironov <[email protected]>
---
v1:
- Check for chip num instead of (vlan_obj->delete_all != 0).
v0:
- Patch introduced.
---
.../net/ethernet/broadcom/bnx2x/bnx2x_main.c | 22 +++++++++++++------
1 file changed, 15 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
index b164f705709d..4d72e9948a58 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
@@ -8504,15 +8504,23 @@ int bnx2x_set_vlan_one(struct bnx2x *bp, u16 vlan, static int bnx2x_del_all_vlans(struct bnx2x *bp) {
struct bnx2x_vlan_mac_obj *vlan_obj = &bp->sp_objs[0].vlan_obj;
- unsigned long ramrod_flags = 0, vlan_flags = 0;
struct bnx2x_vlan_entry *vlan;
- int rc;

- __set_bit(RAMROD_COMP_WAIT, &ramrod_flags);
- __set_bit(BNX2X_VLAN, &vlan_flags);
- rc = vlan_obj->delete_all(bp, vlan_obj, &vlan_flags, &ramrod_flags);
- if (rc)
- return rc;
+ /* The whole *vlan_obj structure may be not initialized if VLAN
+ * filtering offload is not supported by hardware. Currently this is
+ * true for all hardware covered by CHIP_IS_E1x().
+ */
+ if (!CHIP_IS_E1x(bp)) {
+ unsigned long ramrod_flags = 0, vlan_flags = 0;
+ int rc;
+
+ __set_bit(RAMROD_COMP_WAIT, &ramrod_flags);
+ __set_bit(BNX2X_VLAN, &vlan_flags);
+ rc = vlan_obj->delete_all(bp, vlan_obj, &vlan_flags,
+ &ramrod_flags);
+ if (rc)
+ return rc;
+ }

/* Mark that hw forgot all entries */
list_for_each_entry(vlan, &bp->vlan_reg, link)
--
2.20.1

Thanks a lot for root-causing the issue and the patch.
Another (simpler) way to address this would be to invoke bnx2x_del_all_vlans() only for the newer chips, e.g.,
+ if (!CHIP_IS_E1(bp)) {
/* Remove all currently configured VLANs */
rc = bnx2x_del_all_vlans(bp);
if (rc < 0)
BNX2X_ERR("Failed to delete all VLANs\n");
+ }
Could you please check on this?

2018-12-24 15:19:36

by Ivan Mironov

[permalink] [raw]
Subject: Re: [EXT] [PATCH v1] bnx2x: Fix NULL pointer dereference in bnx2x_del_all_vlans() on some hw

On Mon, 2018-12-24 at 14:26 +0000, Sudarsana Reddy Kalluru wrote:
> Thanks a lot for root-causing the issue and the patch.
> Another (simpler) way to address this would be to invoke bnx2x_del_all_vlans() only for the newer chips, e.g.,
> + if (!CHIP_IS_E1(bp)) {
> /* Remove all currently configured VLANs */
> rc = bnx2x_del_all_vlans(bp);
> if (rc < 0)
> BNX2X_ERR("Failed to delete all VLANs\n");
> + }
> Could you please check on this?

Done. See the PATCH v2 in the mailing list. I tested it on my system.