2020-01-11 08:35:42

by Huazhong Tan

[permalink] [raw]
Subject: [PATCH net-next 0/7] net: hns3: add some misc update about reset issue

This series includes some misc update relating to reset issue.
[patch 1/7] & [patch 2/7] splits hclge_reset()/hclgevf_reset()
into two parts: preparing and rebuilding. Since the procedure
of FLR should be separated out from the reset task([patch 3/7 &
patch 3/7]), then the FLR's processing can reuse these codes.

pci_error_handlers.reset_prepare() is void type function, so
[patch 6/7] & [patch 7/7] factor some codes related to PF
function reset to make the preparing done before .reset_prepare()
return.

BTW, [patch 5/7] enlarges the waiting time of reset for matching
the hardware's.

Huazhong Tan (7):
net: hns3: split hclge_reset() into preparing and rebuilding part
net: hns3: split hclgevf_reset() into preparing and rebuilding part
net: hns3: refactor the precedure of PF FLR
net: hns3: refactor the procedure of VF FLR
net: hns3: enlarge HCLGE_RESET_WAIT_CNT
net: hns3: modify hclge_func_reset_sync_vf()'s return type to void
net: hns3: refactor the notification scheme of PF reset

drivers/net/ethernet/hisilicon/hns3/hnae3.h | 5 -
.../ethernet/hisilicon/hns3/hns3pf/hclge_main.c | 204 ++++++++++-----------
.../ethernet/hisilicon/hns3/hns3pf/hclge_main.h | 1 +
.../ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c | 195 ++++++++++----------
.../ethernet/hisilicon/hns3/hns3vf/hclgevf_main.h | 1 +
5 files changed, 194 insertions(+), 212 deletions(-)

--
2.7.4


2020-01-11 08:35:45

by Huazhong Tan

[permalink] [raw]
Subject: [PATCH net-next 3/7] net: hns3: refactor the precedure of PF FLR

Currently, the actual work of PF FLR is handled in the reset task,
which is asynchronous. So in some case, if the preparing and
rebuilding are not done, then the PF FLR will trigger some problems,
for example, makes hardware go into chaos.

So this patch separates the process of PF FLR from reset task, and
adds a semaphore to serialize this reset and others.

When FLR's preparing fails, if there has other higher level reset
pending or failing times less than the HCLGE_FLR_RETRY_CNT, this
preparing should be retried, otherwise PF and its VF may get into
wrong state.

BTW, while the hardware reports misc interrupt during pcie_flr(),
the driver can not receive this interrupt anymore, so disable it
when hclge_flr_prepare() return, and re-enable it when enter
hclge_flr_done().

Signed-off-by: Huazhong Tan <[email protected]>
---
.../ethernet/hisilicon/hns3/hns3pf/hclge_main.c | 85 +++++++++++-----------
.../ethernet/hisilicon/hns3/hns3pf/hclge_main.h | 1 +
2 files changed, 43 insertions(+), 43 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
index 9cbc0b6..89621b5 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
@@ -3277,8 +3277,6 @@ static int hclge_reset_wait(struct hclge_dev *hdev)
reg = HCLGE_FUN_RST_ING;
reg_bit = HCLGE_FUN_RST_ING_B;
break;
- case HNAE3_FLR_RESET:
- break;
default:
dev_err(&hdev->pdev->dev,
"Wait for unsupported reset type: %d\n",
@@ -3286,20 +3284,6 @@ static int hclge_reset_wait(struct hclge_dev *hdev)
return -EINVAL;
}

- if (hdev->reset_type == HNAE3_FLR_RESET) {
- while (!test_bit(HNAE3_FLR_DONE, &hdev->flr_state) &&
- cnt++ < HCLGE_RESET_WAIT_CNT)
- msleep(HCLGE_RESET_WATI_MS);
-
- if (!test_bit(HNAE3_FLR_DONE, &hdev->flr_state)) {
- dev_err(&hdev->pdev->dev,
- "flr wait timeout: %u\n", cnt);
- return -EBUSY;
- }
-
- return 0;
- }
-
val = hclge_read_dev(&hdev->hw, reg);
while (hnae3_get_bit(val, reg_bit) && cnt < HCLGE_RESET_WAIT_CNT) {
msleep(HCLGE_RESET_WATI_MS);
@@ -3490,12 +3474,6 @@ static void hclge_do_reset(struct hclge_dev *hdev)
set_bit(HNAE3_FUNC_RESET, &hdev->reset_pending);
hclge_reset_task_schedule(hdev);
break;
- case HNAE3_FLR_RESET:
- dev_info(&pdev->dev, "FLR requested\n");
- /* schedule again to check later */
- set_bit(HNAE3_FLR_RESET, &hdev->reset_pending);
- hclge_reset_task_schedule(hdev);
- break;
default:
dev_warn(&pdev->dev,
"Unsupported reset type: %d\n", hdev->reset_type);
@@ -3650,10 +3628,6 @@ static int hclge_reset_prepare_wait(struct hclge_dev *hdev)
ret = hclge_func_reset_sync_vf(hdev);
if (ret)
return ret;
-
- set_bit(HCLGE_STATE_CMD_DISABLE, &hdev->state);
- set_bit(HNAE3_FLR_DOWN, &hdev->flr_state);
- hdev->rst_stats.flr_rst_cnt++;
break;
case HNAE3_IMP_RESET:
hclge_handle_imp_error(hdev);
@@ -3989,12 +3963,13 @@ static void hclge_reset_service_task(struct hclge_dev *hdev)
if (!test_and_clear_bit(HCLGE_STATE_RST_SERVICE_SCHED, &hdev->state))
return;

- if (test_and_set_bit(HCLGE_STATE_RST_HANDLING, &hdev->state))
- return;
+ down(&hdev->reset_sem);
+ set_bit(HCLGE_STATE_RST_HANDLING, &hdev->state);

hclge_reset_subtask(hdev);

clear_bit(HCLGE_STATE_RST_HANDLING, &hdev->state);
+ up(&hdev->reset_sem);
}

static void hclge_update_vport_alive(struct hclge_dev *hdev)
@@ -9341,30 +9316,53 @@ static void hclge_state_uninit(struct hclge_dev *hdev)

static void hclge_flr_prepare(struct hnae3_ae_dev *ae_dev)
{
-#define HCLGE_FLR_WAIT_MS 100
-#define HCLGE_FLR_WAIT_CNT 50
- struct hclge_dev *hdev = ae_dev->priv;
- int cnt = 0;
+#define HCLGE_FLR_RETRY_WAIT_MS 500
+#define HCLGE_FLR_RETRY_CNT 5

- clear_bit(HNAE3_FLR_DOWN, &hdev->flr_state);
- clear_bit(HNAE3_FLR_DONE, &hdev->flr_state);
- set_bit(HNAE3_FLR_RESET, &hdev->default_reset_request);
- hclge_reset_event(hdev->pdev, NULL);
+ struct hclge_dev *hdev = ae_dev->priv;
+ int retry_cnt = 0;
+ int ret;

- while (!test_bit(HNAE3_FLR_DOWN, &hdev->flr_state) &&
- cnt++ < HCLGE_FLR_WAIT_CNT)
- msleep(HCLGE_FLR_WAIT_MS);
+retry:
+ down(&hdev->reset_sem);
+ set_bit(HCLGE_STATE_RST_HANDLING, &hdev->state);
+ hdev->reset_type = HNAE3_FLR_RESET;
+ ret = hclge_reset_prepare(hdev);
+ if (ret) {
+ dev_err(&hdev->pdev->dev, "fail to prepare FLR, ret=%d\n",
+ ret);
+ if (hdev->reset_pending ||
+ retry_cnt++ < HCLGE_FLR_RETRY_CNT) {
+ dev_err(&hdev->pdev->dev,
+ "reset_pending:0x%lx, retry_cnt:%d\n",
+ hdev->reset_pending, retry_cnt);
+ clear_bit(HCLGE_STATE_RST_HANDLING, &hdev->state);
+ up(&hdev->reset_sem);
+ msleep(HCLGE_FLR_RETRY_WAIT_MS);
+ goto retry;
+ }
+ }

- if (!test_bit(HNAE3_FLR_DOWN, &hdev->flr_state))
- dev_err(&hdev->pdev->dev,
- "flr wait down timeout: %d\n", cnt);
+ /* disable misc vector before FLR done */
+ hclge_enable_vector(&hdev->misc_vector, false);
+ set_bit(HCLGE_STATE_CMD_DISABLE, &hdev->state);
+ hdev->rst_stats.flr_rst_cnt++;
}

static void hclge_flr_done(struct hnae3_ae_dev *ae_dev)
{
struct hclge_dev *hdev = ae_dev->priv;
+ int ret;
+
+ hclge_enable_vector(&hdev->misc_vector, true);

- set_bit(HNAE3_FLR_DONE, &hdev->flr_state);
+ ret = hclge_reset_rebuild(hdev);
+ if (ret)
+ dev_err(&hdev->pdev->dev, "fail to rebuild, ret=%d\n", ret);
+
+ hdev->reset_type = HNAE3_NONE_RESET;
+ clear_bit(HCLGE_STATE_RST_HANDLING, &hdev->state);
+ up(&hdev->reset_sem);
}

static void hclge_clear_resetting_state(struct hclge_dev *hdev)
@@ -9407,6 +9405,7 @@ static int hclge_init_ae_dev(struct hnae3_ae_dev *ae_dev)

mutex_init(&hdev->vport_lock);
spin_lock_init(&hdev->fd_rule_lock);
+ sema_init(&hdev->reset_sem, 1);

ret = hclge_pci_init(hdev);
if (ret) {
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h
index 4e5cfda..1c1d6b3 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h
@@ -720,6 +720,7 @@ struct hclge_dev {
unsigned long reset_request; /* reset has been requested */
unsigned long reset_pending; /* client rst is pending to be served */
struct hclge_rst_stats rst_stats;
+ struct semaphore reset_sem; /* protect reset process */
u32 fw_version;
u16 num_vmdq_vport; /* Num vmdq vport this PF has set up */
u16 num_tqps; /* Num task queue pairs of this PF */
--
2.7.4

2020-01-11 08:36:12

by Huazhong Tan

[permalink] [raw]
Subject: [PATCH net-next 4/7] net: hns3: refactor the procedure of VF FLR

Currently, the actual work of VF FLR is handled in the reset task,
which is asynchronous. So in some case, if the preparing and
rebuilding are not done, then the VF FLR will trigger some problems,
for example, makes hardware go into chaos.

So this patch separates the process of VF FLR from reset task, and
adds a semaphore to serialize this reset and others.

When FLR's preparing fails, if there has other higher level reset
pending or failing times less than the HCLGE_FLR_RETRY_CNT, this
preparing should be retried, otherwise it will get into a wrong state.

BTW, while the hardware reports misc interrupt during pcie_flr(),
the driver can not receive this interrupt anymore, so disable it
when hclgevf_flr_prepare() return, and re-enable it when enter
hclgevf_flr_done().

Avoid declaring internal function hclgevf_enable_vector(), this patch
also moves its definition forward, and removes unused enum
hnae3_flr_state.

Signed-off-by: Huazhong Tan <[email protected]>
---
drivers/net/ethernet/hisilicon/hns3/hnae3.h | 5 -
.../ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c | 114 ++++++++++-----------
.../ethernet/hisilicon/hns3/hns3vf/hclgevf_main.h | 1 +
3 files changed, 55 insertions(+), 65 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hnae3.h b/drivers/net/ethernet/hisilicon/hns3/hnae3.h
index 6b131ab..a3e4081 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hnae3.h
+++ b/drivers/net/ethernet/hisilicon/hns3/hnae3.h
@@ -167,11 +167,6 @@ enum hnae3_reset_type {
HNAE3_MAX_RESET,
};

-enum hnae3_flr_state {
- HNAE3_FLR_DOWN,
- HNAE3_FLR_DONE,
-};
-
enum hnae3_port_base_vlan_state {
HNAE3_PORT_BASE_VLAN_DISABLE,
HNAE3_PORT_BASE_VLAN_ENABLE,
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
index 02b44d3..b26b8ad 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
@@ -1411,32 +1411,6 @@ static int hclgevf_notify_client(struct hclgevf_dev *hdev,
return ret;
}

-static void hclgevf_flr_done(struct hnae3_ae_dev *ae_dev)
-{
- struct hclgevf_dev *hdev = ae_dev->priv;
-
- set_bit(HNAE3_FLR_DONE, &hdev->flr_state);
-}
-
-static int hclgevf_flr_poll_timeout(struct hclgevf_dev *hdev,
- unsigned long delay_us,
- unsigned long wait_cnt)
-{
- unsigned long cnt = 0;
-
- while (!test_bit(HNAE3_FLR_DONE, &hdev->flr_state) &&
- cnt++ < wait_cnt)
- usleep_range(delay_us, delay_us * 2);
-
- if (!test_bit(HNAE3_FLR_DONE, &hdev->flr_state)) {
- dev_err(&hdev->pdev->dev,
- "flr wait timeout\n");
- return -ETIMEDOUT;
- }
-
- return 0;
-}
-
static int hclgevf_reset_wait(struct hclgevf_dev *hdev)
{
#define HCLGEVF_RESET_WAIT_US 20000
@@ -1447,11 +1421,7 @@ static int hclgevf_reset_wait(struct hclgevf_dev *hdev)
u32 val;
int ret;

- if (hdev->reset_type == HNAE3_FLR_RESET)
- return hclgevf_flr_poll_timeout(hdev,
- HCLGEVF_RESET_WAIT_US,
- HCLGEVF_RESET_WAIT_CNT);
- else if (hdev->reset_type == HNAE3_VF_RESET)
+ if (hdev->reset_type == HNAE3_VF_RESET)
ret = readl_poll_timeout(hdev->hw.io_base +
HCLGEVF_VF_RST_ING, val,
!(val & HCLGEVF_VF_RST_ING_BIT),
@@ -1533,18 +1503,10 @@ static int hclgevf_reset_prepare_wait(struct hclgevf_dev *hdev)

int ret = 0;

- switch (hdev->reset_type) {
- case HNAE3_VF_FUNC_RESET:
+ if (hdev->reset_type == HNAE3_VF_FUNC_RESET) {
ret = hclgevf_send_mbx_msg(hdev, HCLGE_MBX_RESET, 0, NULL,
0, true, NULL, sizeof(u8));
hdev->rst_stats.vf_func_rst_cnt++;
- break;
- case HNAE3_FLR_RESET:
- set_bit(HNAE3_FLR_DOWN, &hdev->flr_state);
- hdev->rst_stats.flr_rst_cnt++;
- break;
- default:
- break;
}

set_bit(HCLGEVF_STATE_CMD_DISABLE, &hdev->state);
@@ -1734,25 +1696,60 @@ static void hclgevf_set_def_reset_request(struct hnae3_ae_dev *ae_dev,
set_bit(rst_type, &hdev->default_reset_request);
}

+static void hclgevf_enable_vector(struct hclgevf_misc_vector *vector, bool en)
+{
+ writel(en ? 1 : 0, vector->addr);
+}
+
static void hclgevf_flr_prepare(struct hnae3_ae_dev *ae_dev)
{
-#define HCLGEVF_FLR_WAIT_MS 100
-#define HCLGEVF_FLR_WAIT_CNT 50
+#define HCLGEVF_FLR_RETRY_WAIT_MS 500
+#define HCLGEVF_FLR_RETRY_CNT 5
+
+ struct hclgevf_dev *hdev = ae_dev->priv;
+ int retry_cnt = 0;
+ int ret;
+
+retry:
+ down(&hdev->reset_sem);
+ set_bit(HCLGEVF_STATE_RST_HANDLING, &hdev->state);
+ hdev->reset_type = HNAE3_FLR_RESET;
+ ret = hclgevf_reset_prepare(hdev);
+ if (ret) {
+ dev_err(&hdev->pdev->dev, "fail to prepare FLR, ret=%d\n",
+ ret);
+ if (hdev->reset_pending ||
+ retry_cnt++ < HCLGEVF_FLR_RETRY_CNT) {
+ dev_err(&hdev->pdev->dev,
+ "reset_pending:0x%lx, retry_cnt:%d\n",
+ hdev->reset_pending, retry_cnt);
+ clear_bit(HCLGEVF_STATE_RST_HANDLING, &hdev->state);
+ up(&hdev->reset_sem);
+ msleep(HCLGEVF_FLR_RETRY_WAIT_MS);
+ goto retry;
+ }
+ }
+
+ /* disable misc vector before FLR done */
+ hclgevf_enable_vector(&hdev->misc_vector, false);
+ hdev->rst_stats.flr_rst_cnt++;
+}
+
+static void hclgevf_flr_done(struct hnae3_ae_dev *ae_dev)
+{
struct hclgevf_dev *hdev = ae_dev->priv;
- int cnt = 0;
+ int ret;

- clear_bit(HNAE3_FLR_DOWN, &hdev->flr_state);
- clear_bit(HNAE3_FLR_DONE, &hdev->flr_state);
- set_bit(HNAE3_FLR_RESET, &hdev->default_reset_request);
- hclgevf_reset_event(hdev->pdev, NULL);
+ hclgevf_enable_vector(&hdev->misc_vector, true);

- while (!test_bit(HNAE3_FLR_DOWN, &hdev->flr_state) &&
- cnt++ < HCLGEVF_FLR_WAIT_CNT)
- msleep(HCLGEVF_FLR_WAIT_MS);
+ ret = hclgevf_reset_rebuild(hdev);
+ if (ret)
+ dev_warn(&hdev->pdev->dev, "fail to rebuild, ret=%d\n",
+ ret);

- if (!test_bit(HNAE3_FLR_DOWN, &hdev->flr_state))
- dev_err(&hdev->pdev->dev,
- "flr wait down timeout: %d\n", cnt);
+ hdev->reset_type = HNAE3_NONE_RESET;
+ clear_bit(HCLGEVF_STATE_RST_HANDLING, &hdev->state);
+ up(&hdev->reset_sem);
}

static u32 hclgevf_get_fw_version(struct hnae3_handle *handle)
@@ -1808,8 +1805,8 @@ static void hclgevf_reset_service_task(struct hclgevf_dev *hdev)
if (!test_and_clear_bit(HCLGEVF_STATE_RST_SERVICE_SCHED, &hdev->state))
return;

- if (test_and_set_bit(HCLGEVF_STATE_RST_HANDLING, &hdev->state))
- return;
+ down(&hdev->reset_sem);
+ set_bit(HCLGEVF_STATE_RST_HANDLING, &hdev->state);

if (test_and_clear_bit(HCLGEVF_RESET_PENDING,
&hdev->reset_state)) {
@@ -1866,6 +1863,7 @@ static void hclgevf_reset_service_task(struct hclgevf_dev *hdev)
}

clear_bit(HCLGEVF_STATE_RST_HANDLING, &hdev->state);
+ up(&hdev->reset_sem);
}

static void hclgevf_mailbox_service_task(struct hclgevf_dev *hdev)
@@ -2009,11 +2007,6 @@ static enum hclgevf_evt_cause hclgevf_check_evt_cause(struct hclgevf_dev *hdev,
return HCLGEVF_VECTOR0_EVENT_OTHER;
}

-static void hclgevf_enable_vector(struct hclgevf_misc_vector *vector, bool en)
-{
- writel(en ? 1 : 0, vector->addr);
-}
-
static irqreturn_t hclgevf_misc_irq_handle(int irq, void *data)
{
enum hclgevf_evt_cause event_cause;
@@ -2288,6 +2281,7 @@ static void hclgevf_state_init(struct hclgevf_dev *hdev)
INIT_DELAYED_WORK(&hdev->service_task, hclgevf_service_task);

mutex_init(&hdev->mbx_resp.mbx_mutex);
+ sema_init(&hdev->reset_sem, 1);

/* bring the device down */
set_bit(HCLGEVF_STATE_DOWN, &hdev->state);
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.h b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.h
index 2cbc7df..fee8d97 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.h
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.h
@@ -253,6 +253,7 @@ struct hclgevf_dev {
unsigned long reset_state; /* requested, pending */
struct hclgevf_rst_stats rst_stats;
u32 reset_attempts;
+ struct semaphore reset_sem; /* protect reset process */

u32 fw_version;
u16 num_tqps; /* num task queue pairs of this PF */
--
2.7.4

2020-01-11 08:36:16

by Huazhong Tan

[permalink] [raw]
Subject: [PATCH net-next 2/7] net: hns3: split hclgevf_reset() into preparing and rebuilding part

hclgevf_reset() is a little bloated, and the process of VF FLR will
be separated from the reset task later. So this patch splits
hclgevf_reset() into hclgevf_reset_prepare() and hclge_reset_rebuild(),
then FLR can also reuse these two functions. Also moves HNAE3_UP_CLIENT
into hclgevf_reset_stack().

Signed-off-by: Huazhong Tan <[email protected]>
---
.../ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c | 81 +++++++++++-----------
1 file changed, 39 insertions(+), 42 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
index cfa797e..02b44d3 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
@@ -1523,7 +1523,8 @@ static int hclgevf_reset_stack(struct hclgevf_dev *hdev)
/* clear handshake status with IMP */
hclgevf_reset_handshake(hdev, false);

- return 0;
+ /* bring up the nic to enable TX/RX again */
+ return hclgevf_notify_client(hdev, HNAE3_UP_CLIENT);
}

static int hclgevf_reset_prepare_wait(struct hclgevf_dev *hdev)
@@ -1603,7 +1604,7 @@ static void hclgevf_reset_err_handle(struct hclgevf_dev *hdev)
}
}

-static int hclgevf_reset(struct hclgevf_dev *hdev)
+static int hclgevf_reset_prepare(struct hclgevf_dev *hdev)
{
struct hnae3_ae_dev *ae_dev = pci_get_drvdata(hdev->pdev);
int ret;
@@ -1613,62 +1614,64 @@ static int hclgevf_reset(struct hclgevf_dev *hdev)
*/
ae_dev->reset_type = hdev->reset_type;
hdev->rst_stats.rst_cnt++;
- rtnl_lock();

+ rtnl_lock();
/* bring down the nic to stop any ongoing TX/RX */
ret = hclgevf_notify_client(hdev, HNAE3_DOWN_CLIENT);
- if (ret)
- goto err_reset_lock;
-
rtnl_unlock();
-
- ret = hclgevf_reset_prepare_wait(hdev);
if (ret)
- goto err_reset;
+ return ret;

- /* check if VF could successfully fetch the hardware reset completion
- * status from the hardware
- */
- ret = hclgevf_reset_wait(hdev);
- if (ret) {
- /* can't do much in this situation, will disable VF */
- dev_err(&hdev->pdev->dev,
- "VF failed(=%d) to fetch H/W reset completion status\n",
- ret);
- goto err_reset;
- }
+ return hclgevf_reset_prepare_wait(hdev);
+}
+
+static int hclgevf_reset_rebuild(struct hclgevf_dev *hdev)
+{
+ struct hnae3_ae_dev *ae_dev = pci_get_drvdata(hdev->pdev);
+ int ret;

hdev->rst_stats.hw_rst_done_cnt++;

rtnl_lock();
-
/* now, re-initialize the nic client and ae device */
ret = hclgevf_reset_stack(hdev);
+ rtnl_unlock();
if (ret) {
dev_err(&hdev->pdev->dev, "failed to reset VF stack\n");
- goto err_reset_lock;
+ return ret;
}

- /* bring up the nic to enable TX/RX again */
- ret = hclgevf_notify_client(hdev, HNAE3_UP_CLIENT);
- if (ret)
- goto err_reset_lock;
-
- rtnl_unlock();
-
hdev->last_reset_time = jiffies;
ae_dev->reset_type = HNAE3_NONE_RESET;
hdev->rst_stats.rst_done_cnt++;
hdev->rst_stats.rst_fail_cnt = 0;
clear_bit(HCLGEVF_STATE_RST_FAIL, &hdev->state);

- return ret;
-err_reset_lock:
- rtnl_unlock();
+ return 0;
+}
+
+static void hclgevf_reset(struct hclgevf_dev *hdev)
+{
+ if (hclgevf_reset_prepare(hdev))
+ goto err_reset;
+
+ /* check if VF could successfully fetch the hardware reset completion
+ * status from the hardware
+ */
+ if (hclgevf_reset_wait(hdev)) {
+ /* can't do much in this situation, will disable VF */
+ dev_err(&hdev->pdev->dev,
+ "failed to fetch H/W reset completion status\n");
+ goto err_reset;
+ }
+
+ if (hclgevf_reset_rebuild(hdev))
+ goto err_reset;
+
+ return;
+
err_reset:
hclgevf_reset_err_handle(hdev);
-
- return ret;
}

static enum hnae3_reset_type hclgevf_get_reset_level(struct hclgevf_dev *hdev,
@@ -1802,8 +1805,6 @@ static void hclgevf_reset_service_task(struct hclgevf_dev *hdev)
{
#define HCLGEVF_MAX_RESET_ATTEMPTS_CNT 3

- int ret;
-
if (!test_and_clear_bit(HCLGEVF_STATE_RST_SERVICE_SCHED, &hdev->state))
return;

@@ -1822,12 +1823,8 @@ static void hclgevf_reset_service_task(struct hclgevf_dev *hdev)
hdev->last_reset_time = jiffies;
while ((hdev->reset_type =
hclgevf_get_reset_level(hdev, &hdev->reset_pending))
- != HNAE3_NONE_RESET) {
- ret = hclgevf_reset(hdev);
- if (ret)
- dev_err(&hdev->pdev->dev,
- "VF stack reset failed %d.\n", ret);
- }
+ != HNAE3_NONE_RESET)
+ hclgevf_reset(hdev);
} else if (test_and_clear_bit(HCLGEVF_RESET_REQUESTED,
&hdev->reset_state)) {
/* we could be here when either of below happens:
--
2.7.4

2020-01-11 08:37:05

by Huazhong Tan

[permalink] [raw]
Subject: [PATCH net-next 5/7] net: hns3: enlarge HCLGE_RESET_WAIT_CNT

When the load of firmware is high, its reset task may takes
more time(which will be as long as 35 seconds). So this
patch modifies HCLGE_RESET_WAIT_CNT to match the firmware's.

Signed-off-by: Huazhong Tan <[email protected]>
---
drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
index 89621b5..d81c976 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
@@ -3260,7 +3260,8 @@ static int hclge_notify_roce_client(struct hclge_dev *hdev,
static int hclge_reset_wait(struct hclge_dev *hdev)
{
#define HCLGE_RESET_WATI_MS 100
-#define HCLGE_RESET_WAIT_CNT 200
+#define HCLGE_RESET_WAIT_CNT 350
+
u32 val, reg, reg_bit;
u32 cnt = 0;

--
2.7.4

2020-01-11 22:43:39

by David Miller

[permalink] [raw]
Subject: Re: [PATCH net-next 0/7] net: hns3: add some misc update about reset issue

From: Huazhong Tan <[email protected]>
Date: Sat, 11 Jan 2020 16:33:46 +0800

> This series includes some misc update relating to reset issue.
> [patch 1/7] & [patch 2/7] splits hclge_reset()/hclgevf_reset()
> into two parts: preparing and rebuilding. Since the procedure
> of FLR should be separated out from the reset task([patch 3/7 &
> patch 3/7]), then the FLR's processing can reuse these codes.
>
> pci_error_handlers.reset_prepare() is void type function, so
> [patch 6/7] & [patch 7/7] factor some codes related to PF
> function reset to make the preparing done before .reset_prepare()
> return.
>
> BTW, [patch 5/7] enlarges the waiting time of reset for matching
> the hardware's.

Series applied, thanks.