2019-02-18 09:57:15

by Vadim Lomovtsev

[permalink] [raw]
Subject: [PATCH v2 0/8] nic: thunderx: fix communication races between VF & PF

From: Vadim Lomovtsev <[email protected]>

The ThunderX CN88XX NIC Virtual Function driver uses mailbox interface
to communicate to physical function driver. Each of VF has it's own pair
of mailbox registers to read from and write to. The mailbox registers
has no protection from possible races, so it has to be implemented
at software side.

After long term testing by loop of 'ip link set <ifname> up/down'
command it was found that there are two possible scenarios when
race condition appears:
1. VF receives link change message from PF and VF send RX mode
configuration message to PF in the same time from separate thread.
2. PF receives RX mode configuration from VF and in the same time,
in separate thread PF detects link status change and sends appropriate
message to particular VF.

Both cases leads to mailbox data to be rewritten, NIC VF messaging control
data to be updated incorrectly and communication sequence gets broken.

This patch series is to address race condition with VF & PF communication.

Vadim Lomovtsev (8):
net: thunderx: correct typo in macro name
net: thunderx: replace global nicvf_rx_mode_wq work queue for all VFs
to private for each of them.
net: thunderx: make CFG_DONE message to run through generic send-ack
sequence
net: thunderx: add nicvf_send_msg_to_pf result check for
set_rx_mode_task
net: thunderx: rework xcast message structure to make it fit into 64
bit
net: thunderx: add mutex to protect mailbox from concurrent calls for
same VF
net: thunderx: add LINK_CHANGE message handler at nicpf
net: thunderx: remove link change polling code and info from nicpf

drivers/net/ethernet/cavium/thunder/nic.h | 14 +-
.../net/ethernet/cavium/thunder/nic_main.c | 149 ++++++------------
.../net/ethernet/cavium/thunder/nicvf_main.c | 133 +++++++++++-----
.../net/ethernet/cavium/thunder/thunder_bgx.c | 2 +-
.../net/ethernet/cavium/thunder/thunder_bgx.h | 2 +-
5 files changed, 147 insertions(+), 153 deletions(-)

--
2.17.2


2019-02-18 09:54:59

by Vadim Lomovtsev

[permalink] [raw]
Subject: [PATCH v2 3/8] net: thunderx: make CFG_DONE message to run through generic send-ack sequence

At the end of NIC VF initialization VF sends CFG_DONE message to PF without
using nicvf_msg_send_to_pf routine. This potentially could re-write data in
mailbox. This commit is to implement common way of sending CFG_DONE message
by the same way with other configuration messages by using
nicvf_send_msg_to_pf() routine.

Signed-off-by: Vadim Lomovtsev <[email protected]>
---
drivers/net/ethernet/cavium/thunder/nic_main.c | 2 +-
.../net/ethernet/cavium/thunder/nicvf_main.c | 18 +++++++++++++++---
2 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/cavium/thunder/nic_main.c b/drivers/net/ethernet/cavium/thunder/nic_main.c
index 6c8dcb65ff03..90497a27df18 100644
--- a/drivers/net/ethernet/cavium/thunder/nic_main.c
+++ b/drivers/net/ethernet/cavium/thunder/nic_main.c
@@ -1039,7 +1039,7 @@ static void nic_handle_mbx_intr(struct nicpf *nic, int vf)
case NIC_MBOX_MSG_CFG_DONE:
/* Last message of VF config msg sequence */
nic_enable_vf(nic, vf, true);
- goto unlock;
+ break;
case NIC_MBOX_MSG_SHUTDOWN:
/* First msg in VF teardown sequence */
if (vf >= nic->num_vf_en)
diff --git a/drivers/net/ethernet/cavium/thunder/nicvf_main.c b/drivers/net/ethernet/cavium/thunder/nicvf_main.c
index abf24e7dff2d..b0e8a04e0f1e 100644
--- a/drivers/net/ethernet/cavium/thunder/nicvf_main.c
+++ b/drivers/net/ethernet/cavium/thunder/nicvf_main.c
@@ -169,6 +169,20 @@ static int nicvf_check_pf_ready(struct nicvf *nic)
return 1;
}

+static int nicvf_send_cfg_done(struct nicvf *nic)
+{
+ union nic_mbx mbx = {};
+
+ mbx.msg.msg = NIC_MBOX_MSG_CFG_DONE;
+ if (nicvf_send_msg_to_pf(nic, &mbx)) {
+ netdev_err(nic->netdev,
+ "PF didn't respond to CFG DONE msg\n");
+ return 0;
+ }
+
+ return 1;
+}
+
static void nicvf_read_bgx_stats(struct nicvf *nic, struct bgx_stats_msg *bgx)
{
if (bgx->rx)
@@ -1416,7 +1430,6 @@ int nicvf_open(struct net_device *netdev)
struct nicvf *nic = netdev_priv(netdev);
struct queue_set *qs = nic->qs;
struct nicvf_cq_poll *cq_poll = NULL;
- union nic_mbx mbx = {};

/* wait till all queued set_rx_mode tasks completes if any */
drain_workqueue(nic->nicvf_rx_mode_wq);
@@ -1515,8 +1528,7 @@ int nicvf_open(struct net_device *netdev)
nicvf_enable_intr(nic, NICVF_INTR_RBDR, qidx);

/* Send VF config done msg to PF */
- mbx.msg.msg = NIC_MBOX_MSG_CFG_DONE;
- nicvf_write_to_mbx(nic, &mbx);
+ nicvf_send_cfg_done(nic);

return 0;
cleanup:
--
2.17.2

2019-02-18 09:55:26

by Vadim Lomovtsev

[permalink] [raw]
Subject: [PATCH v2 7/8] net: thunderx: add LINK_CHANGE message handler at nicpf

Move the link change polling task to VF side in order to
prevent races between VF and PF while sending link change
message(s). This commit is to implement link change request
to be initiated by VF.

Signed-off-by: Vadim Lomovtsev <[email protected]>
---
drivers/net/ethernet/cavium/thunder/nic.h | 2 +-
.../net/ethernet/cavium/thunder/nic_main.c | 39 ++++++++++++--
.../net/ethernet/cavium/thunder/nicvf_main.c | 54 +++++++++++++------
3 files changed, 76 insertions(+), 19 deletions(-)

diff --git a/drivers/net/ethernet/cavium/thunder/nic.h b/drivers/net/ethernet/cavium/thunder/nic.h
index 86cda3f4b37b..62636c1ed141 100644
--- a/drivers/net/ethernet/cavium/thunder/nic.h
+++ b/drivers/net/ethernet/cavium/thunder/nic.h
@@ -331,7 +331,7 @@ struct nicvf {
struct workqueue_struct *nicvf_rx_mode_wq;
/* mutex to protect VF's mailbox contents from concurrent access */
struct mutex rx_mode_mtx;
-
+ struct delayed_work link_change_work;
/* PTP timestamp */
struct cavium_ptp *ptp_clock;
/* Inbound timestamping is on */
diff --git a/drivers/net/ethernet/cavium/thunder/nic_main.c b/drivers/net/ethernet/cavium/thunder/nic_main.c
index 620dbe082ca0..8ab71dae3988 100644
--- a/drivers/net/ethernet/cavium/thunder/nic_main.c
+++ b/drivers/net/ethernet/cavium/thunder/nic_main.c
@@ -929,6 +929,35 @@ static void nic_config_timestamp(struct nicpf *nic, int vf, struct set_ptp *ptp)
nic_reg_write(nic, NIC_PF_PKIND_0_15_CFG | (pkind_idx << 3), pkind_val);
}

+static void nic_link_status_get(struct nicpf *nic, u8 vf)
+{
+ union nic_mbx mbx = {};
+ struct bgx_link_status link;
+ u8 bgx, lmac;
+
+ mbx.link_status.msg = NIC_MBOX_MSG_BGX_LINK_CHANGE;
+
+ /* Get BGX, LMAC indices for the VF */
+ bgx = NIC_GET_BGX_FROM_VF_LMAC_MAP(nic->vf_lmac_map[vf]);
+ lmac = NIC_GET_LMAC_FROM_VF_LMAC_MAP(nic->vf_lmac_map[vf]);
+
+ /* Get interface link status */
+ bgx_get_lmac_link_state(nic->node, bgx, lmac, &link);
+
+ nic->link[vf] = link.link_up;
+ nic->duplex[vf] = link.duplex;
+ nic->speed[vf] = link.speed;
+
+ /* Send a mbox message to VF with current link status */
+ mbx.link_status.link_up = link.link_up;
+ mbx.link_status.duplex = link.duplex;
+ mbx.link_status.speed = link.speed;
+ mbx.link_status.mac_type = link.mac_type;
+
+ /* reply with link status */
+ nic_send_msg_to_vf(nic, vf, &mbx);
+}
+
/* Interrupt handler to handle mailbox messages from VFs */
static void nic_handle_mbx_intr(struct nicpf *nic, int vf)
{
@@ -1108,6 +1137,13 @@ static void nic_handle_mbx_intr(struct nicpf *nic, int vf)
lmac = NIC_GET_LMAC_FROM_VF_LMAC_MAP(nic->vf_lmac_map[vf]);
bgx_set_xcast_mode(nic->node, bgx, lmac, mbx.xcast.mode);
break;
+ case NIC_MBOX_MSG_BGX_LINK_CHANGE:
+ if (vf >= nic->num_vf_en) {
+ ret = -1; /* NACK */
+ break;
+ }
+ nic_link_status_get(nic, vf);
+ goto unlock;
default:
dev_err(&nic->pdev->dev,
"Invalid msg from VF%d, msg 0x%x\n", vf, mbx.msg.msg);
@@ -1419,9 +1455,6 @@ static int nic_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
goto err_disable_sriov;
}

- INIT_DELAYED_WORK(&nic->dwork, nic_poll_for_link);
- queue_delayed_work(nic->check_link, &nic->dwork, 0);
-
return 0;

err_disable_sriov:
diff --git a/drivers/net/ethernet/cavium/thunder/nicvf_main.c b/drivers/net/ethernet/cavium/thunder/nicvf_main.c
index a05e2989ec76..2ecacd0e1b51 100644
--- a/drivers/net/ethernet/cavium/thunder/nicvf_main.c
+++ b/drivers/net/ethernet/cavium/thunder/nicvf_main.c
@@ -245,21 +245,24 @@ static void nicvf_handle_mbx_intr(struct nicvf *nic)
break;
case NIC_MBOX_MSG_BGX_LINK_CHANGE:
nic->pf_acked = true;
- nic->link_up = mbx.link_status.link_up;
- nic->duplex = mbx.link_status.duplex;
- nic->speed = mbx.link_status.speed;
- nic->mac_type = mbx.link_status.mac_type;
- if (nic->link_up) {
- netdev_info(nic->netdev, "Link is Up %d Mbps %s duplex\n",
- nic->speed,
- nic->duplex == DUPLEX_FULL ?
- "Full" : "Half");
- netif_carrier_on(nic->netdev);
- netif_tx_start_all_queues(nic->netdev);
- } else {
- netdev_info(nic->netdev, "Link is Down\n");
- netif_carrier_off(nic->netdev);
- netif_tx_stop_all_queues(nic->netdev);
+ if (nic->link_up != mbx.link_status.link_up) {
+ nic->link_up = mbx.link_status.link_up;
+ nic->duplex = mbx.link_status.duplex;
+ nic->speed = mbx.link_status.speed;
+ nic->mac_type = mbx.link_status.mac_type;
+ if (nic->link_up) {
+ netdev_info(nic->netdev,
+ "Link is Up %d Mbps %s duplex\n",
+ nic->speed,
+ nic->duplex == DUPLEX_FULL ?
+ "Full" : "Half");
+ netif_carrier_on(nic->netdev);
+ netif_tx_start_all_queues(nic->netdev);
+ } else {
+ netdev_info(nic->netdev, "Link is Down\n");
+ netif_carrier_off(nic->netdev);
+ netif_tx_stop_all_queues(nic->netdev);
+ }
}
break;
case NIC_MBOX_MSG_ALLOC_SQS:
@@ -1328,6 +1331,8 @@ int nicvf_stop(struct net_device *netdev)
struct nicvf_cq_poll *cq_poll = NULL;
union nic_mbx mbx = {};

+ cancel_delayed_work_sync(&nic->link_change_work);
+
/* wait till all queued set_rx_mode tasks completes */
drain_workqueue(nic->nicvf_rx_mode_wq);

@@ -1430,6 +1435,20 @@ static int nicvf_update_hw_max_frs(struct nicvf *nic, int mtu)
return nicvf_send_msg_to_pf(nic, &mbx);
}

+static void nicvf_link_status_check_task(struct work_struct *work_arg)
+{
+ struct nicvf *nic = container_of(work_arg,
+ struct nicvf,
+ link_change_work.work);
+ struct net_device *netdev = nic->netdev;
+ union nic_mbx mbx = {};
+
+ mbx.msg.msg = NIC_MBOX_MSG_BGX_LINK_CHANGE;
+ nicvf_send_msg_to_pf(nic, &mbx);
+ queue_delayed_work(nic->nicvf_rx_mode_wq,
+ &nic->link_change_work, 2 * HZ);
+}
+
int nicvf_open(struct net_device *netdev)
{
int cpu, err, qidx;
@@ -1536,6 +1555,11 @@ int nicvf_open(struct net_device *netdev)
/* Send VF config done msg to PF */
nicvf_send_cfg_done(nic);

+ INIT_DELAYED_WORK(&nic->link_change_work,
+ nicvf_link_status_check_task);
+ queue_delayed_work(nic->nicvf_rx_mode_wq,
+ &nic->link_change_work, 0);
+
return 0;
cleanup:
nicvf_disable_intr(nic, NICVF_INTR_MBOX, 0);
--
2.17.2

2019-02-18 09:55:31

by Vadim Lomovtsev

[permalink] [raw]
Subject: [PATCH v2 5/8] net: thunderx: rework xcast message structure to make it fit into 64 bit

To communicate to PF each of ThunderX NIC VF uses mailbox which is
pair of 64 bit registers available to both VFn and PF.

This commit is to change the xcast message structure in order to
fit it into 64 bit.

Signed-off-by: Vadim Lomovtsev <[email protected]>
---
drivers/net/ethernet/cavium/thunder/nic.h | 6 ++----
drivers/net/ethernet/cavium/thunder/nic_main.c | 4 ++--
drivers/net/ethernet/cavium/thunder/nicvf_main.c | 6 +++---
3 files changed, 7 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/cavium/thunder/nic.h b/drivers/net/ethernet/cavium/thunder/nic.h
index 376a96bce33f..227343625e83 100644
--- a/drivers/net/ethernet/cavium/thunder/nic.h
+++ b/drivers/net/ethernet/cavium/thunder/nic.h
@@ -577,10 +577,8 @@ struct set_ptp {

struct xcast {
u8 msg;
- union {
- u8 mode;
- u64 mac;
- } data;
+ u8 mode;
+ u64 mac:48;
};

/* 128 bit shared memory between PF and each VF */
diff --git a/drivers/net/ethernet/cavium/thunder/nic_main.c b/drivers/net/ethernet/cavium/thunder/nic_main.c
index 90497a27df18..620dbe082ca0 100644
--- a/drivers/net/ethernet/cavium/thunder/nic_main.c
+++ b/drivers/net/ethernet/cavium/thunder/nic_main.c
@@ -1094,7 +1094,7 @@ static void nic_handle_mbx_intr(struct nicpf *nic, int vf)
bgx = NIC_GET_BGX_FROM_VF_LMAC_MAP(nic->vf_lmac_map[vf]);
lmac = NIC_GET_LMAC_FROM_VF_LMAC_MAP(nic->vf_lmac_map[vf]);
bgx_set_dmac_cam_filter(nic->node, bgx, lmac,
- mbx.xcast.data.mac,
+ mbx.xcast.mac,
vf < NIC_VF_PER_MBX_REG ? vf :
vf - NIC_VF_PER_MBX_REG);
break;
@@ -1106,7 +1106,7 @@ static void nic_handle_mbx_intr(struct nicpf *nic, int vf)
}
bgx = NIC_GET_BGX_FROM_VF_LMAC_MAP(nic->vf_lmac_map[vf]);
lmac = NIC_GET_LMAC_FROM_VF_LMAC_MAP(nic->vf_lmac_map[vf]);
- bgx_set_xcast_mode(nic->node, bgx, lmac, mbx.xcast.data.mode);
+ bgx_set_xcast_mode(nic->node, bgx, lmac, mbx.xcast.mode);
break;
default:
dev_err(&nic->pdev->dev,
diff --git a/drivers/net/ethernet/cavium/thunder/nicvf_main.c b/drivers/net/ethernet/cavium/thunder/nicvf_main.c
index dbd8862d60d6..30c7f54b4f17 100644
--- a/drivers/net/ethernet/cavium/thunder/nicvf_main.c
+++ b/drivers/net/ethernet/cavium/thunder/nicvf_main.c
@@ -1964,7 +1964,7 @@ static void __nicvf_set_rx_mode_task(u8 mode, struct xcast_addr_list *mc_addrs,
* its' own LMAC to the filter to accept packets for it.
*/
mbx.xcast.msg = NIC_MBOX_MSG_ADD_MCAST;
- mbx.xcast.data.mac = 0;
+ mbx.xcast.mac = 0;
if (nicvf_send_msg_to_pf(nic, &mbx) < 0)
goto free_mc;
}
@@ -1974,7 +1974,7 @@ static void __nicvf_set_rx_mode_task(u8 mode, struct xcast_addr_list *mc_addrs,
/* now go through kernel list of MACs and add them one by one */
for (idx = 0; idx < mc_addrs->count; idx++) {
mbx.xcast.msg = NIC_MBOX_MSG_ADD_MCAST;
- mbx.xcast.data.mac = mc_addrs->mc[idx];
+ mbx.xcast.mac = mc_addrs->mc[idx];
if (nicvf_send_msg_to_pf(nic, &mbx) < 0)
goto free_mc;
}
@@ -1982,7 +1982,7 @@ static void __nicvf_set_rx_mode_task(u8 mode, struct xcast_addr_list *mc_addrs,

/* and finally set rx mode for PF accordingly */
mbx.xcast.msg = NIC_MBOX_MSG_SET_XCAST;
- mbx.xcast.data.mode = mode;
+ mbx.xcast.mode = mode;

nicvf_send_msg_to_pf(nic, &mbx);
free_mc:
--
2.17.2

2019-02-18 09:56:06

by Vadim Lomovtsev

[permalink] [raw]
Subject: [PATCH v2 1/8] net: thunderx: correct typo in macro name

Correct STREERING to STEERING at macro name for BGX steering register.

Signed-off-by: Vadim Lomovtsev <[email protected]>
---
drivers/net/ethernet/cavium/thunder/thunder_bgx.c | 2 +-
drivers/net/ethernet/cavium/thunder/thunder_bgx.h | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/cavium/thunder/thunder_bgx.c b/drivers/net/ethernet/cavium/thunder/thunder_bgx.c
index e337da6ba2a4..673c57b8023f 100644
--- a/drivers/net/ethernet/cavium/thunder/thunder_bgx.c
+++ b/drivers/net/ethernet/cavium/thunder/thunder_bgx.c
@@ -1217,7 +1217,7 @@ static void bgx_init_hw(struct bgx *bgx)

/* Disable MAC steering (NCSI traffic) */
for (i = 0; i < RX_TRAFFIC_STEER_RULE_COUNT; i++)
- bgx_reg_write(bgx, 0, BGX_CMR_RX_STREERING + (i * 8), 0x00);
+ bgx_reg_write(bgx, 0, BGX_CMR_RX_STEERING + (i * 8), 0x00);
}

static u8 bgx_get_lane2sds_cfg(struct bgx *bgx, struct lmac *lmac)
diff --git a/drivers/net/ethernet/cavium/thunder/thunder_bgx.h b/drivers/net/ethernet/cavium/thunder/thunder_bgx.h
index cbdd20b9ee6f..5cbc54e9eb19 100644
--- a/drivers/net/ethernet/cavium/thunder/thunder_bgx.h
+++ b/drivers/net/ethernet/cavium/thunder/thunder_bgx.h
@@ -60,7 +60,7 @@
#define RX_DMACX_CAM_EN BIT_ULL(48)
#define RX_DMACX_CAM_LMACID(x) (((u64)x) << 49)
#define RX_DMAC_COUNT 32
-#define BGX_CMR_RX_STREERING 0x300
+#define BGX_CMR_RX_STEERING 0x300
#define RX_TRAFFIC_STEER_RULE_COUNT 8
#define BGX_CMR_CHAN_MSK_AND 0x450
#define BGX_CMR_BIST_STATUS 0x460
--
2.17.2

2019-02-18 09:56:12

by Vadim Lomovtsev

[permalink] [raw]
Subject: [PATCH v2 4/8] net: thunderx: add nicvf_send_msg_to_pf result check for set_rx_mode_task

The rx_set_mode invokes number of messages to be send to PF for receive
mode configuration. In case if there any issues we need to stop sending
messages and release allocated memory.

This commit is to implement check of nicvf_msg_send_to_pf() result.

Signed-off-by: Vadim Lomovtsev <[email protected]>
---
drivers/net/ethernet/cavium/thunder/nicvf_main.c | 12 ++++++++----
1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/cavium/thunder/nicvf_main.c b/drivers/net/ethernet/cavium/thunder/nicvf_main.c
index b0e8a04e0f1e..dbd8862d60d6 100644
--- a/drivers/net/ethernet/cavium/thunder/nicvf_main.c
+++ b/drivers/net/ethernet/cavium/thunder/nicvf_main.c
@@ -1956,7 +1956,8 @@ static void __nicvf_set_rx_mode_task(u8 mode, struct xcast_addr_list *mc_addrs,

/* flush DMAC filters and reset RX mode */
mbx.xcast.msg = NIC_MBOX_MSG_RESET_XCAST;
- nicvf_send_msg_to_pf(nic, &mbx);
+ if (nicvf_send_msg_to_pf(nic, &mbx) < 0)
+ goto free_mc;

if (mode & BGX_XCAST_MCAST_FILTER) {
/* once enabling filtering, we need to signal to PF to add
@@ -1964,7 +1965,8 @@ static void __nicvf_set_rx_mode_task(u8 mode, struct xcast_addr_list *mc_addrs,
*/
mbx.xcast.msg = NIC_MBOX_MSG_ADD_MCAST;
mbx.xcast.data.mac = 0;
- nicvf_send_msg_to_pf(nic, &mbx);
+ if (nicvf_send_msg_to_pf(nic, &mbx) < 0)
+ goto free_mc;
}

/* check if we have any specific MACs to be added to PF DMAC filter */
@@ -1973,9 +1975,9 @@ static void __nicvf_set_rx_mode_task(u8 mode, struct xcast_addr_list *mc_addrs,
for (idx = 0; idx < mc_addrs->count; idx++) {
mbx.xcast.msg = NIC_MBOX_MSG_ADD_MCAST;
mbx.xcast.data.mac = mc_addrs->mc[idx];
- nicvf_send_msg_to_pf(nic, &mbx);
+ if (nicvf_send_msg_to_pf(nic, &mbx) < 0)
+ goto free_mc;
}
- kfree(mc_addrs);
}

/* and finally set rx mode for PF accordingly */
@@ -1983,6 +1985,8 @@ static void __nicvf_set_rx_mode_task(u8 mode, struct xcast_addr_list *mc_addrs,
mbx.xcast.data.mode = mode;

nicvf_send_msg_to_pf(nic, &mbx);
+free_mc:
+ kfree(mc_addrs);
}

static void nicvf_set_rx_mode_task(struct work_struct *work_arg)
--
2.17.2

2019-02-18 09:56:49

by Vadim Lomovtsev

[permalink] [raw]
Subject: [PATCH v2 2/8] net: thunderx: replace global nicvf_rx_mode_wq work queue for all VFs to private for each of them.

Having one work queue for receive mode configuration ndo_set_rx_mode()
call for all VFs results in making each of them wait till the
set_rx_mode() call completes for another VF if any of close, set
receive mode and change flags calls being already invoked. Potentially
this could cause device state change before appropriate call of receive
mode configuration completes, so the call itself became meaningless,
corrupt data or break configuration sequence.

We don't need any delays in NIC VF configuration sequence so having delayed
work call with 0 delay has no sense.

This commit is to implement one work queue for each NIC VF for set_rx_mode
task and to let them work independently and replacing delayed_work
with work_struct.

Signed-off-by: Vadim Lomovtsev <[email protected]>
---
drivers/net/ethernet/cavium/thunder/nic.h | 4 ++-
.../net/ethernet/cavium/thunder/nicvf_main.c | 30 ++++++++++---------
2 files changed, 19 insertions(+), 15 deletions(-)

diff --git a/drivers/net/ethernet/cavium/thunder/nic.h b/drivers/net/ethernet/cavium/thunder/nic.h
index f4d81765221e..376a96bce33f 100644
--- a/drivers/net/ethernet/cavium/thunder/nic.h
+++ b/drivers/net/ethernet/cavium/thunder/nic.h
@@ -271,7 +271,7 @@ struct xcast_addr_list {
};

struct nicvf_work {
- struct delayed_work work;
+ struct work_struct work;
u8 mode;
struct xcast_addr_list *mc;
};
@@ -327,6 +327,8 @@ struct nicvf {
struct nicvf_work rx_mode_work;
/* spinlock to protect workqueue arguments from concurrent access */
spinlock_t rx_mode_wq_lock;
+ /* workqueue for handling kernel ndo_set_rx_mode() calls */
+ struct workqueue_struct *nicvf_rx_mode_wq;

/* PTP timestamp */
struct cavium_ptp *ptp_clock;
diff --git a/drivers/net/ethernet/cavium/thunder/nicvf_main.c b/drivers/net/ethernet/cavium/thunder/nicvf_main.c
index 88f8a8fa93cd..abf24e7dff2d 100644
--- a/drivers/net/ethernet/cavium/thunder/nicvf_main.c
+++ b/drivers/net/ethernet/cavium/thunder/nicvf_main.c
@@ -68,9 +68,6 @@ module_param(cpi_alg, int, 0444);
MODULE_PARM_DESC(cpi_alg,
"PFC algorithm (0=none, 1=VLAN, 2=VLAN16, 3=IP Diffserv)");

-/* workqueue for handling kernel ndo_set_rx_mode() calls */
-static struct workqueue_struct *nicvf_rx_mode_wq;
-
static inline u8 nicvf_netdev_qidx(struct nicvf *nic, u8 qidx)
{
if (nic->sqs_mode)
@@ -1311,6 +1308,9 @@ int nicvf_stop(struct net_device *netdev)
struct nicvf_cq_poll *cq_poll = NULL;
union nic_mbx mbx = {};

+ /* wait till all queued set_rx_mode tasks completes */
+ drain_workqueue(nic->nicvf_rx_mode_wq);
+
mbx.msg.msg = NIC_MBOX_MSG_SHUTDOWN;
nicvf_send_msg_to_pf(nic, &mbx);

@@ -1418,6 +1418,9 @@ int nicvf_open(struct net_device *netdev)
struct nicvf_cq_poll *cq_poll = NULL;
union nic_mbx mbx = {};

+ /* wait till all queued set_rx_mode tasks completes if any */
+ drain_workqueue(nic->nicvf_rx_mode_wq);
+
netif_carrier_off(netdev);

err = nicvf_register_misc_interrupt(nic);
@@ -1973,7 +1976,7 @@ static void __nicvf_set_rx_mode_task(u8 mode, struct xcast_addr_list *mc_addrs,
static void nicvf_set_rx_mode_task(struct work_struct *work_arg)
{
struct nicvf_work *vf_work = container_of(work_arg, struct nicvf_work,
- work.work);
+ work);
struct nicvf *nic = container_of(vf_work, struct nicvf, rx_mode_work);
u8 mode;
struct xcast_addr_list *mc;
@@ -2030,7 +2033,7 @@ static void nicvf_set_rx_mode(struct net_device *netdev)
kfree(nic->rx_mode_work.mc);
nic->rx_mode_work.mc = mc_list;
nic->rx_mode_work.mode = mode;
- queue_delayed_work(nicvf_rx_mode_wq, &nic->rx_mode_work.work, 0);
+ queue_work(nic->nicvf_rx_mode_wq, &nic->rx_mode_work.work);
spin_unlock(&nic->rx_mode_wq_lock);
}

@@ -2187,7 +2190,10 @@ static int nicvf_probe(struct pci_dev *pdev, const struct pci_device_id *ent)

INIT_WORK(&nic->reset_task, nicvf_reset_task);

- INIT_DELAYED_WORK(&nic->rx_mode_work.work, nicvf_set_rx_mode_task);
+ nic->nicvf_rx_mode_wq = alloc_ordered_workqueue("nicvf_rx_mode_wq_VF%d",
+ WQ_MEM_RECLAIM,
+ nic->vf_id);
+ INIT_WORK(&nic->rx_mode_work.work, nicvf_set_rx_mode_task);
spin_lock_init(&nic->rx_mode_wq_lock);

err = register_netdev(netdev);
@@ -2228,13 +2234,15 @@ static void nicvf_remove(struct pci_dev *pdev)
nic = netdev_priv(netdev);
pnetdev = nic->pnicvf->netdev;

- cancel_delayed_work_sync(&nic->rx_mode_work.work);
-
/* Check if this Qset is assigned to different VF.
* If yes, clean primary and all secondary Qsets.
*/
if (pnetdev && (pnetdev->reg_state == NETREG_REGISTERED))
unregister_netdev(pnetdev);
+ if (nic->nicvf_rx_mode_wq) {
+ destroy_workqueue(nic->nicvf_rx_mode_wq);
+ nic->nicvf_rx_mode_wq = NULL;
+ }
nicvf_unregister_interrupts(nic);
pci_set_drvdata(pdev, NULL);
if (nic->drv_stats)
@@ -2261,17 +2269,11 @@ static struct pci_driver nicvf_driver = {
static int __init nicvf_init_module(void)
{
pr_info("%s, ver %s\n", DRV_NAME, DRV_VERSION);
- nicvf_rx_mode_wq = alloc_ordered_workqueue("nicvf_generic",
- WQ_MEM_RECLAIM);
return pci_register_driver(&nicvf_driver);
}

static void __exit nicvf_cleanup_module(void)
{
- if (nicvf_rx_mode_wq) {
- destroy_workqueue(nicvf_rx_mode_wq);
- nicvf_rx_mode_wq = NULL;
- }
pci_unregister_driver(&nicvf_driver);
}

--
2.17.2

2019-02-18 09:57:51

by Vadim Lomovtsev

[permalink] [raw]
Subject: [PATCH v2 8/8] net: thunderx: remove link change polling code and info from nicpf

Since link change polling routine was moved to nicvf side,
we don't need anymore polling function at nicpf side along
with link status info for all enabled Vfs as at VF side
this info is already tracked.

This commit is to remove unnecessary code & fields from
nicpf structure.

Signed-off-by: Vadim Lomovtsev <[email protected]>
---
.../net/ethernet/cavium/thunder/nic_main.c | 114 ++----------------
1 file changed, 12 insertions(+), 102 deletions(-)

diff --git a/drivers/net/ethernet/cavium/thunder/nic_main.c b/drivers/net/ethernet/cavium/thunder/nic_main.c
index 8ab71dae3988..c90252829ed3 100644
--- a/drivers/net/ethernet/cavium/thunder/nic_main.c
+++ b/drivers/net/ethernet/cavium/thunder/nic_main.c
@@ -57,14 +57,8 @@ struct nicpf {
#define NIC_GET_BGX_FROM_VF_LMAC_MAP(map) ((map >> 4) & 0xF)
#define NIC_GET_LMAC_FROM_VF_LMAC_MAP(map) (map & 0xF)
u8 *vf_lmac_map;
- struct delayed_work dwork;
- struct workqueue_struct *check_link;
- u8 *link;
- u8 *duplex;
- u32 *speed;
u16 cpi_base[MAX_NUM_VFS_SUPPORTED];
u16 rssi_base[MAX_NUM_VFS_SUPPORTED];
- bool mbx_lock[MAX_NUM_VFS_SUPPORTED];

/* MSI-X */
u8 num_vec;
@@ -929,6 +923,10 @@ static void nic_config_timestamp(struct nicpf *nic, int vf, struct set_ptp *ptp)
nic_reg_write(nic, NIC_PF_PKIND_0_15_CFG | (pkind_idx << 3), pkind_val);
}

+/* Get BGX LMAC link status and update corresponding VF
+ * if there is a change, valid only if internal L2 switch
+ * is not present otherwise VF link is always treated as up
+ */
static void nic_link_status_get(struct nicpf *nic, u8 vf)
{
union nic_mbx mbx = {};
@@ -944,10 +942,6 @@ static void nic_link_status_get(struct nicpf *nic, u8 vf)
/* Get interface link status */
bgx_get_lmac_link_state(nic->node, bgx, lmac, &link);

- nic->link[vf] = link.link_up;
- nic->duplex[vf] = link.duplex;
- nic->speed[vf] = link.speed;
-
/* Send a mbox message to VF with current link status */
mbx.link_status.link_up = link.link_up;
mbx.link_status.duplex = link.duplex;
@@ -970,8 +964,6 @@ static void nic_handle_mbx_intr(struct nicpf *nic, int vf)
int i;
int ret = 0;

- nic->mbx_lock[vf] = true;
-
mbx_addr = nic_get_mbx_addr(vf);
mbx_data = (u64 *)&mbx;

@@ -986,12 +978,7 @@ static void nic_handle_mbx_intr(struct nicpf *nic, int vf)
switch (mbx.msg.msg) {
case NIC_MBOX_MSG_READY:
nic_mbx_send_ready(nic, vf);
- if (vf < nic->num_vf_en) {
- nic->link[vf] = 0;
- nic->duplex[vf] = 0;
- nic->speed[vf] = 0;
- }
- goto unlock;
+ return;
case NIC_MBOX_MSG_QS_CFG:
reg_addr = NIC_PF_QSET_0_127_CFG |
(mbx.qs.num << NIC_QS_ID_SHIFT);
@@ -1060,7 +1047,7 @@ static void nic_handle_mbx_intr(struct nicpf *nic, int vf)
break;
case NIC_MBOX_MSG_RSS_SIZE:
nic_send_rss_size(nic, vf);
- goto unlock;
+ return;
case NIC_MBOX_MSG_RSS_CFG:
case NIC_MBOX_MSG_RSS_CFG_CONT:
nic_config_rss(nic, &mbx.rss_cfg);
@@ -1078,19 +1065,19 @@ static void nic_handle_mbx_intr(struct nicpf *nic, int vf)
break;
case NIC_MBOX_MSG_ALLOC_SQS:
nic_alloc_sqs(nic, &mbx.sqs_alloc);
- goto unlock;
+ return;
case NIC_MBOX_MSG_NICVF_PTR:
nic->nicvf[vf] = mbx.nicvf.nicvf;
break;
case NIC_MBOX_MSG_PNICVF_PTR:
nic_send_pnicvf(nic, vf);
- goto unlock;
+ return;
case NIC_MBOX_MSG_SNICVF_PTR:
nic_send_snicvf(nic, &mbx.nicvf);
- goto unlock;
+ return;
case NIC_MBOX_MSG_BGX_STATS:
nic_get_bgx_stats(nic, &mbx.bgx_stats);
- goto unlock;
+ return;
case NIC_MBOX_MSG_LOOPBACK:
ret = nic_config_loopback(nic, &mbx.lbk);
break;
@@ -1099,7 +1086,7 @@ static void nic_handle_mbx_intr(struct nicpf *nic, int vf)
break;
case NIC_MBOX_MSG_PFC:
nic_pause_frame(nic, vf, &mbx.pfc);
- goto unlock;
+ return;
case NIC_MBOX_MSG_PTP_CFG:
nic_config_timestamp(nic, vf, &mbx.ptp);
break;
@@ -1143,7 +1130,7 @@ static void nic_handle_mbx_intr(struct nicpf *nic, int vf)
break;
}
nic_link_status_get(nic, vf);
- goto unlock;
+ return;
default:
dev_err(&nic->pdev->dev,
"Invalid msg from VF%d, msg 0x%x\n", vf, mbx.msg.msg);
@@ -1157,8 +1144,6 @@ static void nic_handle_mbx_intr(struct nicpf *nic, int vf)
mbx.msg.msg, vf);
nic_mbx_send_nack(nic, vf);
}
-unlock:
- nic->mbx_lock[vf] = false;
}

static irqreturn_t nic_mbx_intr_handler(int irq, void *nic_irq)
@@ -1306,52 +1291,6 @@ static int nic_sriov_init(struct pci_dev *pdev, struct nicpf *nic)
return 0;
}

-/* Poll for BGX LMAC link status and update corresponding VF
- * if there is a change, valid only if internal L2 switch
- * is not present otherwise VF link is always treated as up
- */
-static void nic_poll_for_link(struct work_struct *work)
-{
- union nic_mbx mbx = {};
- struct nicpf *nic;
- struct bgx_link_status link;
- u8 vf, bgx, lmac;
-
- nic = container_of(work, struct nicpf, dwork.work);
-
- mbx.link_status.msg = NIC_MBOX_MSG_BGX_LINK_CHANGE;
-
- for (vf = 0; vf < nic->num_vf_en; vf++) {
- /* Poll only if VF is UP */
- if (!nic->vf_enabled[vf])
- continue;
-
- /* Get BGX, LMAC indices for the VF */
- bgx = NIC_GET_BGX_FROM_VF_LMAC_MAP(nic->vf_lmac_map[vf]);
- lmac = NIC_GET_LMAC_FROM_VF_LMAC_MAP(nic->vf_lmac_map[vf]);
- /* Get interface link status */
- bgx_get_lmac_link_state(nic->node, bgx, lmac, &link);
-
- /* Inform VF only if link status changed */
- if (nic->link[vf] == link.link_up)
- continue;
-
- if (!nic->mbx_lock[vf]) {
- nic->link[vf] = link.link_up;
- nic->duplex[vf] = link.duplex;
- nic->speed[vf] = link.speed;
-
- /* Send a mbox message to VF with current link status */
- mbx.link_status.link_up = link.link_up;
- mbx.link_status.duplex = link.duplex;
- mbx.link_status.speed = link.speed;
- mbx.link_status.mac_type = link.mac_type;
- nic_send_msg_to_vf(nic, vf, &mbx);
- }
- }
- queue_delayed_work(nic->check_link, &nic->dwork, HZ * 2);
-}
-
static int nic_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
{
struct device *dev = &pdev->dev;
@@ -1420,18 +1359,6 @@ static int nic_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
if (!nic->vf_lmac_map)
goto err_release_regions;

- nic->link = devm_kmalloc_array(dev, max_lmac, sizeof(u8), GFP_KERNEL);
- if (!nic->link)
- goto err_release_regions;
-
- nic->duplex = devm_kmalloc_array(dev, max_lmac, sizeof(u8), GFP_KERNEL);
- if (!nic->duplex)
- goto err_release_regions;
-
- nic->speed = devm_kmalloc_array(dev, max_lmac, sizeof(u32), GFP_KERNEL);
- if (!nic->speed)
- goto err_release_regions;
-
/* Initialize hardware */
nic_init_hw(nic);

@@ -1447,19 +1374,8 @@ static int nic_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
if (err)
goto err_unregister_interrupts;

- /* Register a physical link status poll fn() */
- nic->check_link = alloc_workqueue("check_link_status",
- WQ_UNBOUND | WQ_MEM_RECLAIM, 1);
- if (!nic->check_link) {
- err = -ENOMEM;
- goto err_disable_sriov;
- }
-
return 0;

-err_disable_sriov:
- if (nic->flags & NIC_SRIOV_ENABLED)
- pci_disable_sriov(pdev);
err_unregister_interrupts:
nic_unregister_interrupts(nic);
err_release_regions:
@@ -1480,12 +1396,6 @@ static void nic_remove(struct pci_dev *pdev)
if (nic->flags & NIC_SRIOV_ENABLED)
pci_disable_sriov(pdev);

- if (nic->check_link) {
- /* Destroy work Queue */
- cancel_delayed_work_sync(&nic->dwork);
- destroy_workqueue(nic->check_link);
- }
-
nic_unregister_interrupts(nic);
pci_release_regions(pdev);

--
2.17.2

2019-02-18 09:58:34

by Vadim Lomovtsev

[permalink] [raw]
Subject: [PATCH v2 6/8] net: thunderx: add mutex to protect mailbox from concurrent calls for same VF

In some cases it could happen that nicvf_send_msg_to_pf() could be called
concurrently for the same NIC VF, and thus re-writing mailbox contents and
breaking messaging sequence with PF by re-writing NICVF data.

This commit is to implement mutex for NICVF to protect mailbox registers
and NICVF messaging control data from concurrent access.

Signed-off-by: Vadim Lomovtsev <[email protected]>
---
drivers/net/ethernet/cavium/thunder/nic.h | 2 ++
drivers/net/ethernet/cavium/thunder/nicvf_main.c | 13 ++++++++++---
2 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/cavium/thunder/nic.h b/drivers/net/ethernet/cavium/thunder/nic.h
index 227343625e83..86cda3f4b37b 100644
--- a/drivers/net/ethernet/cavium/thunder/nic.h
+++ b/drivers/net/ethernet/cavium/thunder/nic.h
@@ -329,6 +329,8 @@ struct nicvf {
spinlock_t rx_mode_wq_lock;
/* workqueue for handling kernel ndo_set_rx_mode() calls */
struct workqueue_struct *nicvf_rx_mode_wq;
+ /* mutex to protect VF's mailbox contents from concurrent access */
+ struct mutex rx_mode_mtx;

/* PTP timestamp */
struct cavium_ptp *ptp_clock;
diff --git a/drivers/net/ethernet/cavium/thunder/nicvf_main.c b/drivers/net/ethernet/cavium/thunder/nicvf_main.c
index 30c7f54b4f17..a05e2989ec76 100644
--- a/drivers/net/ethernet/cavium/thunder/nicvf_main.c
+++ b/drivers/net/ethernet/cavium/thunder/nicvf_main.c
@@ -124,6 +124,9 @@ int nicvf_send_msg_to_pf(struct nicvf *nic, union nic_mbx *mbx)
{
int timeout = NIC_MBOX_MSG_TIMEOUT;
int sleep = 10;
+ int ret = 0;
+
+ mutex_lock(&nic->rx_mode_mtx);

nic->pf_acked = false;
nic->pf_nacked = false;
@@ -136,7 +139,8 @@ int nicvf_send_msg_to_pf(struct nicvf *nic, union nic_mbx *mbx)
netdev_err(nic->netdev,
"PF NACK to mbox msg 0x%02x from VF%d\n",
(mbx->msg.msg & 0xFF), nic->vf_id);
- return -EINVAL;
+ ret = -EINVAL;
+ break;
}
msleep(sleep);
if (nic->pf_acked)
@@ -146,10 +150,12 @@ int nicvf_send_msg_to_pf(struct nicvf *nic, union nic_mbx *mbx)
netdev_err(nic->netdev,
"PF didn't ACK to mbox msg 0x%02x from VF%d\n",
(mbx->msg.msg & 0xFF), nic->vf_id);
- return -EBUSY;
+ ret = -EBUSY;
+ break;
}
}
- return 0;
+ mutex_unlock(&nic->rx_mode_mtx);
+ return ret;
}

/* Checks if VF is able to comminicate with PF
@@ -2211,6 +2217,7 @@ static int nicvf_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
nic->vf_id);
INIT_WORK(&nic->rx_mode_work.work, nicvf_set_rx_mode_task);
spin_lock_init(&nic->rx_mode_wq_lock);
+ mutex_init(&nic->rx_mode_mtx);

err = register_netdev(netdev);
if (err) {
--
2.17.2

2019-02-19 02:44:14

by David Miller

[permalink] [raw]
Subject: Re: [PATCH v2 3/8] net: thunderx: make CFG_DONE message to run through generic send-ack sequence

From: Vadim Lomovtsev <[email protected]>
Date: Mon, 18 Feb 2019 09:52:14 +0000

> @@ -169,6 +169,20 @@ static int nicvf_check_pf_ready(struct nicvf *nic)
> return 1;
> }
>
> +static int nicvf_send_cfg_done(struct nicvf *nic)
> +{
> + union nic_mbx mbx = {};
> +
> + mbx.msg.msg = NIC_MBOX_MSG_CFG_DONE;
> + if (nicvf_send_msg_to_pf(nic, &mbx)) {
> + netdev_err(nic->netdev,
> + "PF didn't respond to CFG DONE msg\n");
> + return 0;
> + }
> +
> + return 1;
> +}
...
> @@ -1515,8 +1528,7 @@ int nicvf_open(struct net_device *netdev)
> nicvf_enable_intr(nic, NICVF_INTR_RBDR, qidx);
>
> /* Send VF config done msg to PF */
> - mbx.msg.msg = NIC_MBOX_MSG_CFG_DONE;
> - nicvf_write_to_mbx(nic, &mbx);
> + nicvf_send_cfg_done(nic);

If the one and only call site doesn't even bother to check the return
value, just make it return void.

Thanks.

2019-02-19 11:22:43

by Vadim Lomovtsev

[permalink] [raw]
Subject: Re: [EXT] Re: [PATCH v2 3/8] net: thunderx: make CFG_DONE message to run through generic send-ack sequence

Hi David,

On Mon, Feb 18, 2019 at 03:33:10PM -0800, David Miller wrote:
> From: Vadim Lomovtsev <[email protected]>
> Date: Mon, 18 Feb 2019 09:52:14 +0000
>
> > @@ -169,6 +169,20 @@ static int nicvf_check_pf_ready(struct nicvf *nic)
> > return 1;
> > }
> >
> > +static int nicvf_send_cfg_done(struct nicvf *nic)
> > +{
> > + union nic_mbx mbx = {};
> > +
> > + mbx.msg.msg = NIC_MBOX_MSG_CFG_DONE;
> > + if (nicvf_send_msg_to_pf(nic, &mbx)) {
> > + netdev_err(nic->netdev,
> > + "PF didn't respond to CFG DONE msg\n");
> > + return 0;
> > + }
> > +
> > + return 1;
> > +}
> ...
> > @@ -1515,8 +1528,7 @@ int nicvf_open(struct net_device *netdev)
> > nicvf_enable_intr(nic, NICVF_INTR_RBDR, qidx);
> >
> > /* Send VF config done msg to PF */
> > - mbx.msg.msg = NIC_MBOX_MSG_CFG_DONE;
> > - nicvf_write_to_mbx(nic, &mbx);
> > + nicvf_send_cfg_done(nic);
>
> If the one and only call site doesn't even bother to check the return
> value, just make it return void.
>
> Thanks.

Thank you for your time and comments.
I'll update patch and re-submit.

WBR,
Vadim