2023-08-07 10:22:34

by Petr Oros

[permalink] [raw]
Subject: [PATCH net 2/2] ice: Fix NULL pointer deref during VF reset

During stress test with attaching and detaching VF from KVM and
simultaneously changing VFs spoofcheck and trust there was a
NULL pointer dereference in ice_reset_vf that VF's VSI is null.

More than one instance of ice_reset_vf() can be running at a given
time. When we rebuild the VSI in ice_reset_vf, another reset can be
triaged from ice_service_task. In this case we can access the currently
uninitialized VSI and couse panic. The window for this racing condition
has been around for a long time but it's much worse after commit
227bf4500aaa ("ice: move VSI delete outside deconfig") because
the reset runs faster. ice_reset_vf() using vf->cfg_lock and when
we move this lock before accessing to the VF VSI, we can fix
BUG for all cases.

Panic occurs sometimes in ice_vsi_is_rx_queue_active() and sometimes
in ice_vsi_stop_all_rx_rings()

With our reproducer, we can hint BUG:
~8h before commit 227bf4500aaa ("ice: move VSI delete outside deconfig").
~20m after commit 227bf4500aaa ("ice: move VSI delete outside deconfig").
After this fix we are not able to reproduce it after ~48h

There was commit cf90b74341ee ("ice: Fix call trace with null VSI during
VF reset") which also tried to fix this issue, but it was only
partially resolved and the bug still exists.

[ 6420.658415] BUG: kernel NULL pointer dereference, address: 0000000000000000
[ 6420.665382] #PF: supervisor read access in kernel mode
[ 6420.670521] #PF: error_code(0x0000) - not-present page
[ 6420.675659] PGD 0
[ 6420.677679] Oops: 0000 [#1] PREEMPT SMP NOPTI
[ 6420.682038] CPU: 53 PID: 326472 Comm: kworker/53:0 Kdump: loaded Not tainted 5.14.0-317.el9.x86_64 #1
[ 6420.691250] Hardware name: Dell Inc. PowerEdge R750/04V528, BIOS 1.6.5 04/15/2022
[ 6420.698729] Workqueue: ice ice_service_task [ice]
[ 6420.703462] RIP: 0010:ice_vsi_is_rx_queue_active+0x2d/0x60 [ice]
[ 6420.705860] ice 0000:ca:00.0: VF 0 is now untrusted
[ 6420.709494] Code: 00 00 66 83 bf 76 04 00 00 00 48 8b 77 10 74 3e 31 c0 eb 0f 0f b7 97 76 04 00 00 48 83 c0 01 39 c2 7e 2b 48 8b 97 68 04 00 00 <0f> b7 0c 42 48 8b 96 20 13 00 00 48 8d 94 8a 00 00 12 00 8b 12 83
[ 6420.714426] ice 0000:ca:00.0 ens7f0: Setting MAC 22:22:22:22:22:00 on VF 0. VF driver will be reinitialized
[ 6420.733120] RSP: 0018:ff778d2ff383fdd8 EFLAGS: 00010246
[ 6420.733123] RAX: 0000000000000000 RBX: ff2acf1916294000 RCX: 0000000000000000
[ 6420.733125] RDX: 0000000000000000 RSI: ff2acf1f2c6401a0 RDI: ff2acf1a27301828
[ 6420.762346] RBP: ff2acf1a27301828 R08: 0000000000000010 R09: 0000000000001000
[ 6420.769476] R10: ff2acf1916286000 R11: 00000000019eba3f R12: ff2acf19066460d0
[ 6420.776611] R13: ff2acf1f2c6401a0 R14: ff2acf1f2c6401a0 R15: 00000000ffffffff
[ 6420.783742] FS: 0000000000000000(0000) GS:ff2acf28ffa80000(0000) knlGS:0000000000000000
[ 6420.791829] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 6420.797575] CR2: 0000000000000000 CR3: 00000016ad410003 CR4: 0000000000773ee0
[ 6420.804708] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 6420.811034] vfio-pci 0000:ca:01.0: enabling device (0000 -> 0002)
[ 6420.811840] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 6420.811841] PKRU: 55555554
[ 6420.811842] Call Trace:
[ 6420.811843] <TASK>
[ 6420.811844] ice_reset_vf+0x9a/0x450 [ice]
[ 6420.811876] ice_process_vflr_event+0x8f/0xc0 [ice]
[ 6420.841343] ice_service_task+0x23b/0x600 [ice]
[ 6420.845884] ? __schedule+0x212/0x550
[ 6420.849550] process_one_work+0x1e2/0x3b0
[ 6420.853563] ? rescuer_thread+0x390/0x390
[ 6420.857577] worker_thread+0x50/0x3a0
[ 6420.861242] ? rescuer_thread+0x390/0x390
[ 6420.865253] kthread+0xdd/0x100
[ 6420.868400] ? kthread_complete_and_exit+0x20/0x20
[ 6420.873194] ret_from_fork+0x1f/0x30
[ 6420.876774] </TASK>
[ 6420.878967] Modules linked in: vfio_pci vfio_pci_core vfio_iommu_type1 vfio iavf vhost_net vhost vhost_iotlb tap tun xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 nft_compat nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_counter nf_tables bridge stp llc sctp ip6_udp_tunnel udp_tunnel nfp tls nfnetlink bluetooth mlx4_en mlx4_core rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs rfkill sunrpc intel_rapl_msr intel_rapl_common i10nm_edac nfit libnvdimm ipmi_ssif x86_pkg_temp_thermal intel_powerclamp coretemp irdma kvm_intel i40e kvm iTCO_wdt dcdbas ib_uverbs irqbypass iTCO_vendor_support mgag200 mei_me ib_core dell_smbios isst_if_mmio isst_if_mbox_pci rapl i2c_algo_bit drm_shmem_helper intel_cstate drm_kms_helper syscopyarea sysfillrect isst_if_common sysimgblt intel_uncore fb_sys_fops dell_wmi_descriptor wmi_bmof intel_vsec mei i2c_i801 acpi_ipmi ipmi_si i2c_smbus ipmi_devintf intel_pch_thermal acpi_power_meter pcspkr

Fixes: efe41860008e ("ice: Fix memory corruption in VF driver")
Fixes: f23df5220d2b ("ice: Fix spurious interrupt during removal of trusted VF")
Signed-off-by: Petr Oros <[email protected]>
---
drivers/net/ethernet/intel/ice/ice_vf_lib.c | 15 ++++++++-------
1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_vf_lib.c b/drivers/net/ethernet/intel/ice/ice_vf_lib.c
index 294e91c3453ccd..ea3310be8354cf 100644
--- a/drivers/net/ethernet/intel/ice/ice_vf_lib.c
+++ b/drivers/net/ethernet/intel/ice/ice_vf_lib.c
@@ -612,11 +612,17 @@ int ice_reset_vf(struct ice_vf *vf, u32 flags)
return 0;
}

+ if (flags & ICE_VF_RESET_LOCK)
+ mutex_lock(&vf->cfg_lock);
+ else
+ lockdep_assert_held(&vf->cfg_lock);
+
if (ice_is_vf_disabled(vf)) {
vsi = ice_get_vf_vsi(vf);
if (!vsi) {
dev_dbg(dev, "VF is already removed\n");
- return -EINVAL;
+ err = -EINVAL;
+ goto out_unlock;
}
ice_vsi_stop_lan_tx_rings(vsi, ICE_NO_RESET, vf->vf_id);

@@ -625,14 +631,9 @@ int ice_reset_vf(struct ice_vf *vf, u32 flags)

dev_dbg(dev, "VF is already disabled, there is no need for resetting it, telling VM, all is fine %d\n",
vf->vf_id);
- return 0;
+ goto out_unlock;
}

- if (flags & ICE_VF_RESET_LOCK)
- mutex_lock(&vf->cfg_lock);
- else
- lockdep_assert_held(&vf->cfg_lock);
-
/* Set VF disable bit state here, before triggering reset */
set_bit(ICE_VF_STATE_DIS, vf->vf_states);
ice_trigger_vf_reset(vf, flags & ICE_VF_RESET_VFLR, false);
--
2.41.0



2023-08-08 19:18:33

by Simon Horman

[permalink] [raw]
Subject: Re: [PATCH net 2/2] ice: Fix NULL pointer deref during VF reset

On Mon, Aug 07, 2023 at 11:48:31AM +0200, Petr Oros wrote:
> During stress test with attaching and detaching VF from KVM and
> simultaneously changing VFs spoofcheck and trust there was a
> NULL pointer dereference in ice_reset_vf that VF's VSI is null.
>
> More than one instance of ice_reset_vf() can be running at a given
> time. When we rebuild the VSI in ice_reset_vf, another reset can be
> triaged from ice_service_task. In this case we can access the currently
> uninitialized VSI and couse panic. The window for this racing condition

nit: couse -> cause

> has been around for a long time but it's much worse after commit
> 227bf4500aaa ("ice: move VSI delete outside deconfig") because
> the reset runs faster. ice_reset_vf() using vf->cfg_lock and when
> we move this lock before accessing to the VF VSI, we can fix
> BUG for all cases.
>
> Panic occurs sometimes in ice_vsi_is_rx_queue_active() and sometimes
> in ice_vsi_stop_all_rx_rings()
>
> With our reproducer, we can hint BUG:
> ~8h before commit 227bf4500aaa ("ice: move VSI delete outside deconfig").
> ~20m after commit 227bf4500aaa ("ice: move VSI delete outside deconfig").
> After this fix we are not able to reproduce it after ~48h
>
> There was commit cf90b74341ee ("ice: Fix call trace with null VSI during
> VF reset") which also tried to fix this issue, but it was only
> partially resolved and the bug still exists.
>
> [ 6420.658415] BUG: kernel NULL pointer dereference, address: 0000000000000000
> [ 6420.665382] #PF: supervisor read access in kernel mode
> [ 6420.670521] #PF: error_code(0x0000) - not-present page
> [ 6420.675659] PGD 0
> [ 6420.677679] Oops: 0000 [#1] PREEMPT SMP NOPTI
> [ 6420.682038] CPU: 53 PID: 326472 Comm: kworker/53:0 Kdump: loaded Not tainted 5.14.0-317.el9.x86_64 #1
> [ 6420.691250] Hardware name: Dell Inc. PowerEdge R750/04V528, BIOS 1.6.5 04/15/2022
> [ 6420.698729] Workqueue: ice ice_service_task [ice]
> [ 6420.703462] RIP: 0010:ice_vsi_is_rx_queue_active+0x2d/0x60 [ice]
> [ 6420.705860] ice 0000:ca:00.0: VF 0 is now untrusted
> [ 6420.709494] Code: 00 00 66 83 bf 76 04 00 00 00 48 8b 77 10 74 3e 31 c0 eb 0f 0f b7 97 76 04 00 00 48 83 c0 01 39 c2 7e 2b 48 8b 97 68 04 00 00 <0f> b7 0c 42 48 8b 96 20 13 00 00 48 8d 94 8a 00 00 12 00 8b 12 83
> [ 6420.714426] ice 0000:ca:00.0 ens7f0: Setting MAC 22:22:22:22:22:00 on VF 0. VF driver will be reinitialized
> [ 6420.733120] RSP: 0018:ff778d2ff383fdd8 EFLAGS: 00010246
> [ 6420.733123] RAX: 0000000000000000 RBX: ff2acf1916294000 RCX: 0000000000000000
> [ 6420.733125] RDX: 0000000000000000 RSI: ff2acf1f2c6401a0 RDI: ff2acf1a27301828
> [ 6420.762346] RBP: ff2acf1a27301828 R08: 0000000000000010 R09: 0000000000001000
> [ 6420.769476] R10: ff2acf1916286000 R11: 00000000019eba3f R12: ff2acf19066460d0
> [ 6420.776611] R13: ff2acf1f2c6401a0 R14: ff2acf1f2c6401a0 R15: 00000000ffffffff
> [ 6420.783742] FS: 0000000000000000(0000) GS:ff2acf28ffa80000(0000) knlGS:0000000000000000
> [ 6420.791829] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 6420.797575] CR2: 0000000000000000 CR3: 00000016ad410003 CR4: 0000000000773ee0
> [ 6420.804708] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 6420.811034] vfio-pci 0000:ca:01.0: enabling device (0000 -> 0002)
> [ 6420.811840] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 6420.811841] PKRU: 55555554
> [ 6420.811842] Call Trace:
> [ 6420.811843] <TASK>
> [ 6420.811844] ice_reset_vf+0x9a/0x450 [ice]
> [ 6420.811876] ice_process_vflr_event+0x8f/0xc0 [ice]
> [ 6420.841343] ice_service_task+0x23b/0x600 [ice]
> [ 6420.845884] ? __schedule+0x212/0x550
> [ 6420.849550] process_one_work+0x1e2/0x3b0
> [ 6420.853563] ? rescuer_thread+0x390/0x390
> [ 6420.857577] worker_thread+0x50/0x3a0
> [ 6420.861242] ? rescuer_thread+0x390/0x390
> [ 6420.865253] kthread+0xdd/0x100
> [ 6420.868400] ? kthread_complete_and_exit+0x20/0x20
> [ 6420.873194] ret_from_fork+0x1f/0x30
> [ 6420.876774] </TASK>
> [ 6420.878967] Modules linked in: vfio_pci vfio_pci_core vfio_iommu_type1 vfio iavf vhost_net vhost vhost_iotlb tap tun xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 nft_compat nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_counter nf_tables bridge stp llc sctp ip6_udp_tunnel udp_tunnel nfp tls nfnetlink bluetooth mlx4_en mlx4_core rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs rfkill sunrpc intel_rapl_msr intel_rapl_common i10nm_edac nfit libnvdimm ipmi_ssif x86_pkg_temp_thermal intel_powerclamp coretemp irdma kvm_intel i40e kvm iTCO_wdt dcdbas ib_uverbs irqbypass iTCO_vendor_support mgag200 mei_me ib_core dell_smbios isst_if_mmio isst_if_mbox_pci rapl i2c_algo_bit drm_shmem_helper intel_cstate drm_kms_helper syscopyarea sysfillrect isst_if_common sysimgblt intel_uncore fb_sys_fops dell_wmi_descriptor wmi_bmof intel_vsec mei i2c_i801 acpi_ipmi ipmi_si i2c_smbus ipmi_devintf intel_pch_thermal acpi_power_meter pcspkr
>
> Fixes: efe41860008e ("ice: Fix memory corruption in VF driver")
> Fixes: f23df5220d2b ("ice: Fix spurious interrupt during removal of trusted VF")
> Signed-off-by: Petr Oros <[email protected]>

Reviewed-by: Simon Horman <[email protected]>


2023-08-08 22:51:09

by Przemek Kitszel

[permalink] [raw]
Subject: Re: [Intel-wired-lan] [PATCH net 2/2] ice: Fix NULL pointer deref during VF reset

On 8/7/23 11:48, Petr Oros wrote:
> During stress test with attaching and detaching VF from KVM and
> simultaneously changing VFs spoofcheck and trust there was a
> NULL pointer dereference in ice_reset_vf that VF's VSI is null.
>
> More than one instance of ice_reset_vf() can be running at a given
> time. When we rebuild the VSI in ice_reset_vf, another reset can be
> triaged from ice_service_task. In this case we can access the currently
> uninitialized VSI and couse panic. The window for this racing condition
> has been around for a long time but it's much worse after commit
> 227bf4500aaa ("ice: move VSI delete outside deconfig") because
> the reset runs faster. ice_reset_vf() using vf->cfg_lock and when
> we move this lock before accessing to the VF VSI, we can fix
> BUG for all cases.
>
> Panic occurs sometimes in ice_vsi_is_rx_queue_active() and sometimes
> in ice_vsi_stop_all_rx_rings()
>
> With our reproducer, we can hint BUG:

s/hint/hit/

> ~8h before commit 227bf4500aaa ("ice: move VSI delete outside deconfig").
> ~20m after commit 227bf4500aaa ("ice: move VSI delete outside deconfig").
> After this fix we are not able to reproduce it after ~48h
>
> There was commit cf90b74341ee ("ice: Fix call trace with null VSI during
> VF reset") which also tried to fix this issue, but it was only
> partially resolved and the bug still exists.
>
> [ 6420.658415] BUG: kernel NULL pointer dereference, address: 0000000000000000
> [ 6420.665382] #PF: supervisor read access in kernel mode
> [ 6420.670521] #PF: error_code(0x0000) - not-present page
> [ 6420.675659] PGD 0
> [ 6420.677679] Oops: 0000 [#1] PREEMPT SMP NOPTI
> [ 6420.682038] CPU: 53 PID: 326472 Comm: kworker/53:0 Kdump: loaded Not tainted 5.14.0-317.el9.x86_64 #1
> [ 6420.691250] Hardware name: Dell Inc. PowerEdge R750/04V528, BIOS 1.6.5 04/15/2022
> [ 6420.698729] Workqueue: ice ice_service_task [ice]
> [ 6420.703462] RIP: 0010:ice_vsi_is_rx_queue_active+0x2d/0x60 [ice]
> [ 6420.705860] ice 0000:ca:00.0: VF 0 is now untrusted
> [ 6420.709494] Code: 00 00 66 83 bf 76 04 00 00 00 48 8b 77 10 74 3e 31 c0 eb 0f 0f b7 97 76 04 00 00 48 83 c0 01 39 c2 7e 2b 48 8b 97 68 04 00 00 <0f> b7 0c 42 48 8b 96 20 13 00 00 48 8d 94 8a 00 00 12 00 8b 12 83
> [ 6420.714426] ice 0000:ca:00.0 ens7f0: Setting MAC 22:22:22:22:22:00 on VF 0. VF driver will be reinitialized
> [ 6420.733120] RSP: 0018:ff778d2ff383fdd8 EFLAGS: 00010246
> [ 6420.733123] RAX: 0000000000000000 RBX: ff2acf1916294000 RCX: 0000000000000000
> [ 6420.733125] RDX: 0000000000000000 RSI: ff2acf1f2c6401a0 RDI: ff2acf1a27301828
> [ 6420.762346] RBP: ff2acf1a27301828 R08: 0000000000000010 R09: 0000000000001000
> [ 6420.769476] R10: ff2acf1916286000 R11: 00000000019eba3f R12: ff2acf19066460d0
> [ 6420.776611] R13: ff2acf1f2c6401a0 R14: ff2acf1f2c6401a0 R15: 00000000ffffffff
> [ 6420.783742] FS: 0000000000000000(0000) GS:ff2acf28ffa80000(0000) knlGS:0000000000000000
> [ 6420.791829] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 6420.797575] CR2: 0000000000000000 CR3: 00000016ad410003 CR4: 0000000000773ee0
> [ 6420.804708] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 6420.811034] vfio-pci 0000:ca:01.0: enabling device (0000 -> 0002)
> [ 6420.811840] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 6420.811841] PKRU: 55555554
> [ 6420.811842] Call Trace:
> [ 6420.811843] <TASK>
> [ 6420.811844] ice_reset_vf+0x9a/0x450 [ice]
> [ 6420.811876] ice_process_vflr_event+0x8f/0xc0 [ice]
> [ 6420.841343] ice_service_task+0x23b/0x600 [ice]
> [ 6420.845884] ? __schedule+0x212/0x550
> [ 6420.849550] process_one_work+0x1e2/0x3b0
> [ 6420.853563] ? rescuer_thread+0x390/0x390
> [ 6420.857577] worker_thread+0x50/0x3a0
> [ 6420.861242] ? rescuer_thread+0x390/0x390
> [ 6420.865253] kthread+0xdd/0x100
> [ 6420.868400] ? kthread_complete_and_exit+0x20/0x20
> [ 6420.873194] ret_from_fork+0x1f/0x30
> [ 6420.876774] </TASK>
> [ 6420.878967] Modules linked in: vfio_pci vfio_pci_core vfio_iommu_type1 vfio iavf vhost_net vhost vhost_iotlb tap tun xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 nft_compat nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_counter nf_tables bridge stp llc sctp ip6_udp_tunnel udp_tunnel nfp tls nfnetlink bluetooth mlx4_en mlx4_core rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs rfkill sunrpc intel_rapl_msr intel_rapl_common i10nm_edac nfit libnvdimm ipmi_ssif x86_pkg_temp_thermal intel_powerclamp coretemp irdma kvm_intel i40e kvm iTCO_wdt dcdbas ib_uverbs irqbypass iTCO_vendor_support mgag200 mei_me ib_core dell_smbios isst_if_mmio isst_if_mbox_pci rapl i2c_algo_bit drm_shmem_helper intel_cstate drm_kms_helper syscopyarea sysfillrect isst_if_common sysimgblt intel_uncore fb_sys_fops dell_wmi_descriptor wmi_bmof intel_vsec mei i2c_i801 acpi_ipmi ipmi_si i2c_smbus ipmi_devintf intel_pch_thermal acpi_power_meter pcspk
> r
>
> Fixes: efe41860008e ("ice: Fix memory corruption in VF driver")
> Fixes: f23df5220d2b ("ice: Fix spurious interrupt during removal of trusted VF")
> Signed-off-by: Petr Oros <[email protected]>

Thank you for all of your testing efforts, detailed explanation,
and the fix!

Is there anything interesting about your setup/methodology?
(Asking rather to explore and extend our tests, no doubts here :)

> ---
> drivers/net/ethernet/intel/ice/ice_vf_lib.c | 15 ++++++++-------
> 1 file changed, 8 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/ice/ice_vf_lib.c b/drivers/net/ethernet/intel/ice/ice_vf_lib.c
> index 294e91c3453ccd..ea3310be8354cf 100644
> --- a/drivers/net/ethernet/intel/ice/ice_vf_lib.c
> +++ b/drivers/net/ethernet/intel/ice/ice_vf_lib.c
> @@ -612,11 +612,17 @@ int ice_reset_vf(struct ice_vf *vf, u32 flags)
> return 0;
> }
>
> + if (flags & ICE_VF_RESET_LOCK)
> + mutex_lock(&vf->cfg_lock);
> + else
> + lockdep_assert_held(&vf->cfg_lock);
> +
> if (ice_is_vf_disabled(vf)) {
> vsi = ice_get_vf_vsi(vf);
> if (!vsi) {
> dev_dbg(dev, "VF is already removed\n");
> - return -EINVAL;
> + err = -EINVAL;
> + goto out_unlock;
> }
> ice_vsi_stop_lan_tx_rings(vsi, ICE_NO_RESET, vf->vf_id);
>
> @@ -625,14 +631,9 @@ int ice_reset_vf(struct ice_vf *vf, u32 flags)
>
> dev_dbg(dev, "VF is already disabled, there is no need for resetting it, telling VM, all is fine %d\n",
> vf->vf_id);
> - return 0;
> + goto out_unlock;
> }
>
> - if (flags & ICE_VF_RESET_LOCK)
> - mutex_lock(&vf->cfg_lock);
> - else
> - lockdep_assert_held(&vf->cfg_lock);
> -
> /* Set VF disable bit state here, before triggering reset */
> set_bit(ICE_VF_STATE_DIS, vf->vf_states);
> ice_trigger_vf_reset(vf, flags & ICE_VF_RESET_VFLR, false);

2023-08-09 19:13:03

by Petr Oros

[permalink] [raw]
Subject: Re: [Intel-wired-lan] [PATCH net 2/2] ice: Fix NULL pointer deref during VF reset

Przemek Kitszel píše v Čt 01. 01. 1970 v 00:00 +0000:
> On 8/7/23 11:48, Petr Oros wrote:
> > During stress test with attaching and detaching VF from KVM and
> > simultaneously changing VFs spoofcheck and trust there was a
> > NULL pointer dereference in ice_reset_vf that VF's VSI is null.
> >
> > More than one instance of ice_reset_vf() can be running at a given
> > time. When we rebuild the VSI in ice_reset_vf, another reset can be
> > triaged from ice_service_task. In this case we can access the
> > currently
> > uninitialized VSI and couse panic. The window for this racing
> > condition
> > has been around for a long time but it's much worse after commit
> > 227bf4500aaa ("ice: move VSI delete outside deconfig") because
> > the reset runs faster. ice_reset_vf() using vf->cfg_lock and when
> > we move this lock before accessing to the VF VSI, we can fix
> > BUG for all cases.
> >
> > Panic occurs sometimes in ice_vsi_is_rx_queue_active() and
> > sometimes
> > in ice_vsi_stop_all_rx_rings()
> >
> > With our reproducer, we can hint BUG:
>
> s/hint/hit/
>
> > ~8h before commit 227bf4500aaa ("ice: move VSI delete outside
> > deconfig").
> > ~20m after commit 227bf4500aaa ("ice: move VSI delete outside
> > deconfig").
> > After this fix we are not able to reproduce it after ~48h
> >
> > There was commit cf90b74341ee ("ice: Fix call trace with null VSI
> > during
> > VF reset") which also tried to fix this issue, but it was only
> > partially resolved and the bug still exists.
> >
> > [ 6420.658415] BUG: kernel NULL pointer dereference, address:
> > 0000000000000000
> > [ 6420.665382] #PF: supervisor read access in kernel mode
> > [ 6420.670521] #PF: error_code(0x0000) - not-present page
> > [ 6420.675659] PGD 0
> > [ 6420.677679] Oops: 0000 [#1] PREEMPT SMP NOPTI
> > [ 6420.682038] CPU: 53 PID: 326472 Comm: kworker/53:0 Kdump: loaded
> > Not tainted 5.14.0-317.el9.x86_64 #1
> > [ 6420.691250] Hardware name: Dell Inc. PowerEdge R750/04V528, BIOS
> > 1.6.5 04/15/2022
> > [ 6420.698729] Workqueue: ice ice_service_task [ice]
> > [ 6420.703462] RIP: 0010:ice_vsi_is_rx_queue_active+0x2d/0x60 [ice]
> > [ 6420.705860] ice 0000:ca:00.0: VF 0 is now untrusted
> > [ 6420.709494] Code: 00 00 66 83 bf 76 04 00 00 00 48 8b 77 10 74
> > 3e 31 c0 eb 0f 0f b7 97 76 04 00 00 48 83 c0 01 39 c2 7e 2b 48 8b
> > 97 68 04 00 00 <0f> b7 0c 42 48 8b 96 20 13 00 00 48 8d 94 8a 00 00
> > 12 00 8b 12 83
> > [ 6420.714426] ice 0000:ca:00.0 ens7f0: Setting MAC
> > 22:22:22:22:22:00 on VF 0. VF driver will be reinitialized
> > [ 6420.733120] RSP: 0018:ff778d2ff383fdd8 EFLAGS: 00010246
> > [ 6420.733123] RAX: 0000000000000000 RBX: ff2acf1916294000 RCX:
> > 0000000000000000
> > [ 6420.733125] RDX: 0000000000000000 RSI: ff2acf1f2c6401a0 RDI:
> > ff2acf1a27301828
> > [ 6420.762346] RBP: ff2acf1a27301828 R08: 0000000000000010 R09:
> > 0000000000001000
> > [ 6420.769476] R10: ff2acf1916286000 R11: 00000000019eba3f R12:
> > ff2acf19066460d0
> > [ 6420.776611] R13: ff2acf1f2c6401a0 R14: ff2acf1f2c6401a0 R15:
> > 00000000ffffffff
> > [ 6420.783742] FS:  0000000000000000(0000)
> > GS:ff2acf28ffa80000(0000) knlGS:0000000000000000
> > [ 6420.791829] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 6420.797575] CR2: 0000000000000000 CR3: 00000016ad410003 CR4:
> > 0000000000773ee0
> > [ 6420.804708] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> > 0000000000000000
> > [ 6420.811034] vfio-pci 0000:ca:01.0: enabling device (0000 ->
> > 0002)
> > [ 6420.811840] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> > 0000000000000400
> > [ 6420.811841] PKRU: 55555554
> > [ 6420.811842] Call Trace:
> > [ 6420.811843]  <TASK>
> > [ 6420.811844]  ice_reset_vf+0x9a/0x450 [ice]
> > [ 6420.811876]  ice_process_vflr_event+0x8f/0xc0 [ice]
> > [ 6420.841343]  ice_service_task+0x23b/0x600 [ice]
> > [ 6420.845884]  ? __schedule+0x212/0x550
> > [ 6420.849550]  process_one_work+0x1e2/0x3b0
> > [ 6420.853563]  ? rescuer_thread+0x390/0x390
> > [ 6420.857577]  worker_thread+0x50/0x3a0
> > [ 6420.861242]  ? rescuer_thread+0x390/0x390
> > [ 6420.865253]  kthread+0xdd/0x100
> > [ 6420.868400]  ? kthread_complete_and_exit+0x20/0x20
> > [ 6420.873194]  ret_from_fork+0x1f/0x30
> > [ 6420.876774]  </TASK>
> > [ 6420.878967] Modules linked in: vfio_pci vfio_pci_core
> > vfio_iommu_type1 vfio iavf vhost_net vhost vhost_iotlb tap tun
> > xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4
> > nft_compat nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6
> > nf_defrag_ipv4 nft_counter nf_tables bridge stp llc sctp
> > ip6_udp_tunnel udp_tunnel nfp tls nfnetlink bluetooth mlx4_en
> > mlx4_core rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd
> > grace fscache netfs rfkill sunrpc intel_rapl_msr intel_rapl_common
> > i10nm_edac nfit libnvdimm ipmi_ssif x86_pkg_temp_thermal
> > intel_powerclamp coretemp irdma kvm_intel i40e kvm iTCO_wdt dcdbas
> > ib_uverbs irqbypass iTCO_vendor_support mgag200 mei_me ib_core
> > dell_smbios isst_if_mmio isst_if_mbox_pci rapl i2c_algo_bit
> > drm_shmem_helper intel_cstate drm_kms_helper syscopyarea
> > sysfillrect isst_if_common sysimgblt intel_uncore fb_sys_fops
> > dell_wmi_descriptor wmi_bmof intel_vsec mei i2c_i801 acpi_ipmi
> > ipmi_si i2c_smbus ipmi_devintf intel_pch_thermal acpi_power_meter
> > pcspk
> >   r
> >
> > Fixes: efe41860008e ("ice: Fix memory corruption in VF driver")
> > Fixes: f23df5220d2b ("ice: Fix spurious interrupt during removal of
> > trusted VF")
> > Signed-off-by: Petr Oros <[email protected]>
>
> Thank you for all of your testing efforts, detailed explanation,
> and the fix!
>
> Is there anything interesting about your setup/methodology?
> (Asking rather to explore and extend our tests, no doubts here :)

This test is part of our QE test suite. There are a large number of
"cfg stress" tests.
I can't explain all the tests, but this one in particular just created
2 threads.
One attaching/detaching VF from VM and the other sets VF parameters in
a loop (trust, spoof check, mac addr) This causes concurrent access to
resources during reset. ice/iavf are very sensitive to this type of
stress ;)

We also using LNST, but it currently catches other types of bugs.

>
> > ---
> >   drivers/net/ethernet/intel/ice/ice_vf_lib.c | 15 ++++++++-------
> >   1 file changed, 8 insertions(+), 7 deletions(-)
> >
> > diff --git a/drivers/net/ethernet/intel/ice/ice_vf_lib.c
> > b/drivers/net/ethernet/intel/ice/ice_vf_lib.c
> > index 294e91c3453ccd..ea3310be8354cf 100644
> > --- a/drivers/net/ethernet/intel/ice/ice_vf_lib.c
> > +++ b/drivers/net/ethernet/intel/ice/ice_vf_lib.c
> > @@ -612,11 +612,17 @@ int ice_reset_vf(struct ice_vf *vf, u32
> > flags)
> >                 return 0;
> >         }
> >  
> > +       if (flags & ICE_VF_RESET_LOCK)
> > +               mutex_lock(&vf->cfg_lock);
> > +       else
> > +               lockdep_assert_held(&vf->cfg_lock);
> > +
> >         if (ice_is_vf_disabled(vf)) {
> >                 vsi = ice_get_vf_vsi(vf);
> >                 if (!vsi) {
> >                         dev_dbg(dev, "VF is already removed\n");
> > -                       return -EINVAL;
> > +                       err = -EINVAL;
> > +                       goto out_unlock;
> >                 }
> >                 ice_vsi_stop_lan_tx_rings(vsi, ICE_NO_RESET, vf-
> > >vf_id);
> >  
> > @@ -625,14 +631,9 @@ int ice_reset_vf(struct ice_vf *vf, u32 flags)
> >  
> >                 dev_dbg(dev, "VF is already disabled, there is no
> > need for resetting it, telling VM, all is fine %d\n",
> >                         vf->vf_id);
> > -               return 0;
> > +               goto out_unlock;
> >         }
> >  
> > -       if (flags & ICE_VF_RESET_LOCK)
> > -               mutex_lock(&vf->cfg_lock);
> > -       else
> > -               lockdep_assert_held(&vf->cfg_lock);
> > -
> >         /* Set VF disable bit state here, before triggering reset
> > */
> >         set_bit(ICE_VF_STATE_DIS, vf->vf_states);
> >         ice_trigger_vf_reset(vf, flags & ICE_VF_RESET_VFLR, false);
>