2022-07-05 06:38:28

by Qiao Ma

[permalink] [raw]
Subject: [PATCH net-next v3 2/3] net: hinic: avoid kernel hung in hinic_get_stats64()

When using hinic device as a bond slave device, and reading device stats
of master bond device, the kernel may hung.

The kernel panic calltrace as follows:
Kernel panic - not syncing: softlockup: hung tasks
Call trace:
native_queued_spin_lock_slowpath+0x1ec/0x31c
dev_get_stats+0x60/0xcc
dev_seq_printf_stats+0x40/0x120
dev_seq_show+0x1c/0x40
seq_read_iter+0x3c8/0x4dc
seq_read+0xe0/0x130
proc_reg_read+0xa8/0xe0
vfs_read+0xb0/0x1d4
ksys_read+0x70/0xfc
__arm64_sys_read+0x20/0x30
el0_svc_common+0x88/0x234
do_el0_svc+0x2c/0x90
el0_svc+0x1c/0x30
el0_sync_handler+0xa8/0xb0
el0_sync+0x148/0x180

And the calltrace of task that actually caused kernel hungs as follows:
__switch_to+124
__schedule+548
schedule+72
schedule_timeout+348
__down_common+188
__down+24
down+104
hinic_get_stats64+44 [hinic]
dev_get_stats+92
bond_get_stats+172 [bonding]
dev_get_stats+92
dev_seq_printf_stats+60
dev_seq_show+24
seq_read_iter+964
seq_read+220
proc_reg_read+164
vfs_read+172
ksys_read+108
__arm64_sys_read+28
el0_svc_common+132
do_el0_svc+40
el0_svc+24
el0_sync_handler+164
el0_sync+324

When getting device stats from bond, kernel will call bond_get_stats().
It first holds the spinlock bond->stats_lock, and then call
hinic_get_stats64() to collect hinic device's stats.
However, hinic_get_stats64() calls `down(&nic_dev->mgmt_lock)` to
protect its critical section, which may schedule current task out.
And if system is under high pressure, the task cannot be woken up
immediately, which eventually triggers kernel hung panic.

Since previous patch has replaced hinic_dev.tx_stats/rx_stats with local
variable in hinic_get_stats64(), there is nothing need to be protected
by lock, so just removing down()/up() is ok.

Fixes: edd384f682cc ("net-next/hinic: Add ethtool and stats")
Signed-off-by: Qiao Ma <[email protected]>
---
drivers/net/ethernet/huawei/hinic/hinic_main.c | 2 --
1 file changed, 2 deletions(-)

diff --git a/drivers/net/ethernet/huawei/hinic/hinic_main.c b/drivers/net/ethernet/huawei/hinic/hinic_main.c
index 0fc048a7b244..57ed73653792 100644
--- a/drivers/net/ethernet/huawei/hinic/hinic_main.c
+++ b/drivers/net/ethernet/huawei/hinic/hinic_main.c
@@ -849,10 +849,8 @@ static void hinic_get_stats64(struct net_device *netdev,
u64_stats_init(&nic_rx_stats.syncp);
u64_stats_init(&nic_tx_stats.syncp);

- down(&nic_dev->mgmt_lock);
if (nic_dev->flags & HINIC_INTF_UP)
gather_nic_stats(nic_dev, &nic_rx_stats, &nic_tx_stats);
- up(&nic_dev->mgmt_lock);

stats->rx_bytes = nic_rx_stats.bytes;
stats->rx_packets = nic_rx_stats.pkts;
--
1.8.3.1


2022-07-05 06:45:35

by Qiao Ma

[permalink] [raw]
Subject: [PATCH net-next v3 3/3] net: hinic: fix bug that u64_stats_sync is not initialized

In get_drv_queue_stats(), the local variable {txq|rxq}_stats
should be initialized first before calling into
hinic_{rxq|txq}_get_stats(), this patch fixes it.

Fixes: edd384f682cc ("net-next/hinic: Add ethtool and stats")
Signed-off-by: Qiao Ma <[email protected]>
---
drivers/net/ethernet/huawei/hinic/hinic_ethtool.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/drivers/net/ethernet/huawei/hinic/hinic_ethtool.c b/drivers/net/ethernet/huawei/hinic/hinic_ethtool.c
index 93192f58ac88..75e9711bd2ba 100644
--- a/drivers/net/ethernet/huawei/hinic/hinic_ethtool.c
+++ b/drivers/net/ethernet/huawei/hinic/hinic_ethtool.c
@@ -1371,6 +1371,9 @@ static void get_drv_queue_stats(struct hinic_dev *nic_dev, u64 *data)
u16 i = 0, j = 0, qid = 0;
char *p;

+ u64_stats_init(&txq_stats.syncp);
+ u64_stats_init(&rxq_stats.syncp);
+
for (qid = 0; qid < nic_dev->num_qps; qid++) {
if (!nic_dev->txqs)
break;
--
1.8.3.1

2022-07-05 08:56:41

by Eric Dumazet

[permalink] [raw]
Subject: Re: [PATCH net-next v3 3/3] net: hinic: fix bug that u64_stats_sync is not initialized

On Tue, Jul 5, 2022 at 8:26 AM Qiao Ma <[email protected]> wrote:
>
> In get_drv_queue_stats(), the local variable {txq|rxq}_stats
> should be initialized first before calling into
> hinic_{rxq|txq}_get_stats(), this patch fixes it.
>
> Fixes: edd384f682cc ("net-next/hinic: Add ethtool and stats")
> Signed-off-by: Qiao Ma <[email protected]>
> ---
> drivers/net/ethernet/huawei/hinic/hinic_ethtool.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/drivers/net/ethernet/huawei/hinic/hinic_ethtool.c b/drivers/net/ethernet/huawei/hinic/hinic_ethtool.c
> index 93192f58ac88..75e9711bd2ba 100644
> --- a/drivers/net/ethernet/huawei/hinic/hinic_ethtool.c
> +++ b/drivers/net/ethernet/huawei/hinic/hinic_ethtool.c
> @@ -1371,6 +1371,9 @@ static void get_drv_queue_stats(struct hinic_dev *nic_dev, u64 *data)
> u16 i = 0, j = 0, qid = 0;
> char *p;
>
> + u64_stats_init(&txq_stats.syncp);
> + u64_stats_init(&rxq_stats.syncp);
> +

This is wrong really.

txq_stats and rxq_stats are local variables in get_drv_queue_stats()

It makes little sense to use u64_stats infra on them, because they are
not visible to other cpus/threads in the host.

Please remove this confusion, instead of consolidating it.

diff --git a/drivers/net/ethernet/huawei/hinic/hinic_main.c
b/drivers/net/ethernet/huawei/hinic/hinic_main.c
index 56a89793f47d4209b9e0dc3a122801d476e61381..edaac5a33458d51a3fb3e75c5fbe5bec8385f688
100644
--- a/drivers/net/ethernet/huawei/hinic/hinic_main.c
+++ b/drivers/net/ethernet/huawei/hinic/hinic_main.c
@@ -85,8 +85,6 @@ static void update_rx_stats(struct hinic_dev
*nic_dev, struct hinic_rxq *rxq)
struct hinic_rxq_stats *nic_rx_stats = &nic_dev->rx_stats;
struct hinic_rxq_stats rx_stats;

- u64_stats_init(&rx_stats.syncp);
-
hinic_rxq_get_stats(rxq, &rx_stats);

u64_stats_update_begin(&nic_rx_stats->syncp);
@@ -105,8 +103,6 @@ static void update_tx_stats(struct hinic_dev
*nic_dev, struct hinic_txq *txq)
struct hinic_txq_stats *nic_tx_stats = &nic_dev->tx_stats;
struct hinic_txq_stats tx_stats;

- u64_stats_init(&tx_stats.syncp);
-
hinic_txq_get_stats(txq, &tx_stats);

u64_stats_update_begin(&nic_tx_stats->syncp);
diff --git a/drivers/net/ethernet/huawei/hinic/hinic_rx.c
b/drivers/net/ethernet/huawei/hinic/hinic_rx.c
index 24b7b819dbfbad1d64116ef54058ee4887d7a056..4edf4c52787051aebc512094741bda30de27e2f0
100644
--- a/drivers/net/ethernet/huawei/hinic/hinic_rx.c
+++ b/drivers/net/ethernet/huawei/hinic/hinic_rx.c
@@ -66,14 +66,13 @@ void hinic_rxq_clean_stats(struct hinic_rxq *rxq)
/**
* hinic_rxq_get_stats - get statistics of Rx Queue
* @rxq: Logical Rx Queue
- * @stats: return updated stats here
+ * @stats: return updated stats here (private to caller)
**/
void hinic_rxq_get_stats(struct hinic_rxq *rxq, struct hinic_rxq_stats *stats)
{
- struct hinic_rxq_stats *rxq_stats = &rxq->rxq_stats;
+ const struct hinic_rxq_stats *rxq_stats = &rxq->rxq_stats;
unsigned int start;

- u64_stats_update_begin(&stats->syncp);
do {
start = u64_stats_fetch_begin(&rxq_stats->syncp);
stats->pkts = rxq_stats->pkts;
@@ -83,7 +82,6 @@ void hinic_rxq_get_stats(struct hinic_rxq *rxq,
struct hinic_rxq_stats *stats)
stats->csum_errors = rxq_stats->csum_errors;
stats->other_errors = rxq_stats->other_errors;
} while (u64_stats_fetch_retry(&rxq_stats->syncp, start));
- u64_stats_update_end(&stats->syncp);
}

/**
diff --git a/drivers/net/ethernet/huawei/hinic/hinic_tx.c
b/drivers/net/ethernet/huawei/hinic/hinic_tx.c
index 87408e7bb8097de6fced7f0f2d170179b3fe93a9..2d97add1107f08f088b68a823767a92cbc6bbbdf
100644
--- a/drivers/net/ethernet/huawei/hinic/hinic_tx.c
+++ b/drivers/net/ethernet/huawei/hinic/hinic_tx.c
@@ -91,14 +91,13 @@ void hinic_txq_clean_stats(struct hinic_txq *txq)
/**
* hinic_txq_get_stats - get statistics of Tx Queue
* @txq: Logical Tx Queue
- * @stats: return updated stats here
+ * @stats: return updated stats here (private to caller)
**/
void hinic_txq_get_stats(struct hinic_txq *txq, struct hinic_txq_stats *stats)
{
- struct hinic_txq_stats *txq_stats = &txq->txq_stats;
+ const struct hinic_txq_stats *txq_stats = &txq->txq_stats;
unsigned int start;

- u64_stats_update_begin(&stats->syncp);
do {
start = u64_stats_fetch_begin(&txq_stats->syncp);
stats->pkts = txq_stats->pkts;
@@ -108,7 +107,6 @@ void hinic_txq_get_stats(struct hinic_txq *txq,
struct hinic_txq_stats *stats)
stats->tx_dropped = txq_stats->tx_dropped;
stats->big_frags_pkts = txq_stats->big_frags_pkts;
} while (u64_stats_fetch_retry(&txq_stats->syncp, start));
- u64_stats_update_end(&stats->syncp);
}

/**

2022-07-05 09:27:40

by Qiao Ma

[permalink] [raw]
Subject: Re: [PATCH net-next v3 3/3] net: hinic: fix bug that u64_stats_sync is not initialized



在 2022/7/5 下午4:53, Eric Dumazet 写道:
> On Tue, Jul 5, 2022 at 8:26 AM Qiao Ma <[email protected]> wrote:
>>
>> In get_drv_queue_stats(), the local variable {txq|rxq}_stats
>> should be initialized first before calling into
>> hinic_{rxq|txq}_get_stats(), this patch fixes it.
>>
>> Fixes: edd384f682cc ("net-next/hinic: Add ethtool and stats")
>> Signed-off-by: Qiao Ma <[email protected]>
>> ---
>> drivers/net/ethernet/huawei/hinic/hinic_ethtool.c | 3 +++
>> 1 file changed, 3 insertions(+)
>>
>> diff --git a/drivers/net/ethernet/huawei/hinic/hinic_ethtool.c b/drivers/net/ethernet/huawei/hinic/hinic_ethtool.c
>> index 93192f58ac88..75e9711bd2ba 100644
>> --- a/drivers/net/ethernet/huawei/hinic/hinic_ethtool.c
>> +++ b/drivers/net/ethernet/huawei/hinic/hinic_ethtool.c
>> @@ -1371,6 +1371,9 @@ static void get_drv_queue_stats(struct hinic_dev *nic_dev, u64 *data)
>> u16 i = 0, j = 0, qid = 0;
>> char *p;
>>
>> + u64_stats_init(&txq_stats.syncp);
>> + u64_stats_init(&rxq_stats.syncp);
>> +
>
> This is wrong really.
>
> txq_stats and rxq_stats are local variables in get_drv_queue_stats()
>
> It makes little sense to use u64_stats infra on them, because they are
> not visible to other cpus/threads in the host.
>
> Please remove this confusion, instead of consolidating it.
Thanks, I'll remove it in next version.
>
> diff --git a/drivers/net/ethernet/huawei/hinic/hinic_main.c
> b/drivers/net/ethernet/huawei/hinic/hinic_main.c
> index 56a89793f47d4209b9e0dc3a122801d476e61381..edaac5a33458d51a3fb3e75c5fbe5bec8385f688
> 100644
> --- a/drivers/net/ethernet/huawei/hinic/hinic_main.c
> +++ b/drivers/net/ethernet/huawei/hinic/hinic_main.c
> @@ -85,8 +85,6 @@ static void update_rx_stats(struct hinic_dev
> *nic_dev, struct hinic_rxq *rxq)
> struct hinic_rxq_stats *nic_rx_stats = &nic_dev->rx_stats;
> struct hinic_rxq_stats rx_stats;
>
> - u64_stats_init(&rx_stats.syncp);
> -
> hinic_rxq_get_stats(rxq, &rx_stats);
>
> u64_stats_update_begin(&nic_rx_stats->syncp);
> @@ -105,8 +103,6 @@ static void update_tx_stats(struct hinic_dev
> *nic_dev, struct hinic_txq *txq)
> struct hinic_txq_stats *nic_tx_stats = &nic_dev->tx_stats;
> struct hinic_txq_stats tx_stats;
>
> - u64_stats_init(&tx_stats.syncp);
> -
> hinic_txq_get_stats(txq, &tx_stats);
>
> u64_stats_update_begin(&nic_tx_stats->syncp);
> diff --git a/drivers/net/ethernet/huawei/hinic/hinic_rx.c
> b/drivers/net/ethernet/huawei/hinic/hinic_rx.c
> index 24b7b819dbfbad1d64116ef54058ee4887d7a056..4edf4c52787051aebc512094741bda30de27e2f0
> 100644
> --- a/drivers/net/ethernet/huawei/hinic/hinic_rx.c
> +++ b/drivers/net/ethernet/huawei/hinic/hinic_rx.c
> @@ -66,14 +66,13 @@ void hinic_rxq_clean_stats(struct hinic_rxq *rxq)
> /**
> * hinic_rxq_get_stats - get statistics of Rx Queue
> * @rxq: Logical Rx Queue
> - * @stats: return updated stats here
> + * @stats: return updated stats here (private to caller)
> **/
> void hinic_rxq_get_stats(struct hinic_rxq *rxq, struct hinic_rxq_stats *stats)
> {
> - struct hinic_rxq_stats *rxq_stats = &rxq->rxq_stats;
> + const struct hinic_rxq_stats *rxq_stats = &rxq->rxq_stats;
> unsigned int start;
>
> - u64_stats_update_begin(&stats->syncp);
> do {
> start = u64_stats_fetch_begin(&rxq_stats->syncp);
> stats->pkts = rxq_stats->pkts;
> @@ -83,7 +82,6 @@ void hinic_rxq_get_stats(struct hinic_rxq *rxq,
> struct hinic_rxq_stats *stats)
> stats->csum_errors = rxq_stats->csum_errors;
> stats->other_errors = rxq_stats->other_errors;
> } while (u64_stats_fetch_retry(&rxq_stats->syncp, start));
> - u64_stats_update_end(&stats->syncp);
> }
>
> /**
> diff --git a/drivers/net/ethernet/huawei/hinic/hinic_tx.c
> b/drivers/net/ethernet/huawei/hinic/hinic_tx.c
> index 87408e7bb8097de6fced7f0f2d170179b3fe93a9..2d97add1107f08f088b68a823767a92cbc6bbbdf
> 100644
> --- a/drivers/net/ethernet/huawei/hinic/hinic_tx.c
> +++ b/drivers/net/ethernet/huawei/hinic/hinic_tx.c
> @@ -91,14 +91,13 @@ void hinic_txq_clean_stats(struct hinic_txq *txq)
> /**
> * hinic_txq_get_stats - get statistics of Tx Queue
> * @txq: Logical Tx Queue
> - * @stats: return updated stats here
> + * @stats: return updated stats here (private to caller)
> **/
> void hinic_txq_get_stats(struct hinic_txq *txq, struct hinic_txq_stats *stats)
> {
> - struct hinic_txq_stats *txq_stats = &txq->txq_stats;
> + const struct hinic_txq_stats *txq_stats = &txq->txq_stats;
> unsigned int start;
>
> - u64_stats_update_begin(&stats->syncp);
> do {
> start = u64_stats_fetch_begin(&txq_stats->syncp);
> stats->pkts = txq_stats->pkts;
> @@ -108,7 +107,6 @@ void hinic_txq_get_stats(struct hinic_txq *txq,
> struct hinic_txq_stats *stats)
> stats->tx_dropped = txq_stats->tx_dropped;
> stats->big_frags_pkts = txq_stats->big_frags_pkts;
> } while (u64_stats_fetch_retry(&txq_stats->syncp, start));
> - u64_stats_update_end(&stats->syncp);
> }
>
> /**