This patch fixes the TX hang issue on Aspeed AST2600.
Two HW arbitration features are added onto ast2600, but these features will
cause MAC TX to hang when handling scatter-gather DMA. These two
problematic features can be disabled by setting MAC register 0x58 bit28
and bit27.
Dylan Hung (1):
net: ftgmac100: Fix Aspeed ast2600 TX hang issue
drivers/net/ethernet/faraday/ftgmac100.c | 5 +++++
drivers/net/ethernet/faraday/ftgmac100.h | 8 ++++++++
2 files changed, 13 insertions(+)
--
2.17.1
The new HW arbitration feature on Aspeed ast2600 will cause MAC TX to
hang when handling scatter-gather DMA. Disable the problematic feature
by setting MAC register 0x58 bit28 and bit27.
Signed-off-by: Dylan Hung <[email protected]>
---
drivers/net/ethernet/faraday/ftgmac100.c | 5 +++++
drivers/net/ethernet/faraday/ftgmac100.h | 8 ++++++++
2 files changed, 13 insertions(+)
diff --git a/drivers/net/ethernet/faraday/ftgmac100.c b/drivers/net/ethernet/faraday/ftgmac100.c
index 87236206366f..00024dd41147 100644
--- a/drivers/net/ethernet/faraday/ftgmac100.c
+++ b/drivers/net/ethernet/faraday/ftgmac100.c
@@ -1817,6 +1817,11 @@ static int ftgmac100_probe(struct platform_device *pdev)
priv->rxdes0_edorr_mask = BIT(30);
priv->txdes0_edotr_mask = BIT(30);
priv->is_aspeed = true;
+ /* Disable ast2600 problematic HW arbitration */
+ if (of_device_is_compatible(np, "aspeed,ast2600-mac")) {
+ iowrite32(FTGMAC100_TM_DEFAULT,
+ priv->base + FTGMAC100_OFFSET_TM);
+ }
} else {
priv->rxdes0_edorr_mask = BIT(15);
priv->txdes0_edotr_mask = BIT(15);
diff --git a/drivers/net/ethernet/faraday/ftgmac100.h b/drivers/net/ethernet/faraday/ftgmac100.h
index e5876a3fda91..63b3e02fab16 100644
--- a/drivers/net/ethernet/faraday/ftgmac100.h
+++ b/drivers/net/ethernet/faraday/ftgmac100.h
@@ -169,6 +169,14 @@
#define FTGMAC100_MACCR_FAST_MODE (1 << 19)
#define FTGMAC100_MACCR_SW_RST (1 << 31)
+/*
+ * test mode control register
+ */
+#define FTGMAC100_TM_RQ_TX_VALID_DIS (1 << 28)
+#define FTGMAC100_TM_RQ_RR_IDLE_PREV (1 << 27)
+#define FTGMAC100_TM_DEFAULT \
+ (FTGMAC100_TM_RQ_TX_VALID_DIS | FTGMAC100_TM_RQ_RR_IDLE_PREV)
+
/*
* PHY control register
*/
--
2.17.1
> -----Original Message-----
> From: Joel Stanley [mailto:[email protected]]
> Sent: Thursday, October 15, 2020 6:31 AM
> To: Dylan Hung <[email protected]>
> Cc: David S . Miller <[email protected]>; Jakub Kicinski
> <[email protected]>; [email protected]; Linux Kernel Mailing List
> <[email protected]>; Po-Yu Chuang <[email protected]>;
> linux-aspeed <[email protected]>; OpenBMC Maillist
> <[email protected]>; BMC-SW <[email protected]>
> Subject: Re: [PATCH 1/1] net: ftgmac100: Fix Aspeed ast2600 TX hang issue
>
> On Wed, 14 Oct 2020 at 13:32, Dylan Hung <[email protected]>
> wrote:
> > > > The new HW arbitration feature on Aspeed ast2600 will cause MAC TX
> > > > to hang when handling scatter-gather DMA. Disable the problematic
> > > > feature by setting MAC register 0x58 bit28 and bit27.
> > >
> > > Hi Dylan,
> > >
> > > What are the symptoms of this issue? We are seeing this on our systems:
> > >
> > > [29376.090637] WARNING: CPU: 0 PID: 9 at net/sched/sch_generic.c:442
> > > dev_watchdog+0x2f0/0x2f4
> > > [29376.099898] NETDEV WATCHDOG: eth0 (ftgmac100): transmit queue 0
> > > timed out
> > >
> >
> > May I know your soc version? This issue happens on ast2600 version A1.
> The registers to fix this issue are meaningless/reserved on A0 chip, so it is
> okay to set them on either A0 or A1.
>
> We are running the A1. All of our A0 parts have been replaced with A1.
>
> > I was encountering this issue when I was running the iperf TX test. The
> symptom is the TX descriptors are consumed, but no complete packet is sent
> out.
>
> What parameters are you using for iperf? I did a lot of testing with
> iperf3 (and stress-ng running at the same time) and couldn't reproduce the
> error.
>
I simply use "iperf -c <server ip>" on ast2600. It is very easy to reproduce. I append the log below:
Noticed that this issue only happens when HW scatter-gather (NETIF_F_SG) is on.
[AST /]$ iperf3 -c 192.168.100.89
Connecting to host 192.168.100.89, port 5201
[ 4] local 192.168.100.45 port 45346 connected to 192.168.100.89 port 5201
[ ID] Interval Transfer Bandwidth Retr Cwnd
[ 4] 0.00-1.00 sec 44.8 MBytes 375 Mbits/sec 2 1.43 KBytes
[ 4] 1.00-2.00 sec 0.00 Bytes 0.00 bits/sec 2 1.43 KBytes
[ 4] 2.00-3.00 sec 0.00 Bytes 0.00 bits/sec 0 1.43 KBytes
[ 4] 3.00-4.00 sec 0.00 Bytes 0.00 bits/sec 1 1.43 KBytes
[ 4] 4.00-5.00 sec 0.00 Bytes 0.00 bits/sec 0 1.43 KBytes
^C[ 4] 5.00-5.88 sec 0.00 Bytes 0.00 bits/sec 0 1.43 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bandwidth Retr
[ 4] 0.00-5.88 sec 44.8 MBytes 64.0 Mbits/sec 5 sender
[ 4] 0.00-5.88 sec 0.00 Bytes 0.00 bits/sec receiver
iperf3: interrupt - the client has terminated
> We could only reproduce it when performing other functions, such as
> debugging/booting the host processor.
>
Could it be another issue?
> > > > +/*
> > > > + * test mode control register
> > > > + */
> > > > +#define FTGMAC100_TM_RQ_TX_VALID_DIS (1 << 28) #define
> > > > +FTGMAC100_TM_RQ_RR_IDLE_PREV (1 << 27) #define
> > > > +FTGMAC100_TM_DEFAULT
> > > \
> > > > + (FTGMAC100_TM_RQ_TX_VALID_DIS |
> > > FTGMAC100_TM_RQ_RR_IDLE_PREV)
> > >
> > > Will aspeed issue an updated datasheet with this register documented?
>
> Did you see this question?
>
Sorry, I missed this question. Aspeed will update the datasheet accordingly.
> Cheers,
>
> Joel
On Thu, 15 Oct 2020 at 01:49, Dylan Hung <[email protected]> wrote:
> > > I was encountering this issue when I was running the iperf TX test. The
> > symptom is the TX descriptors are consumed, but no complete packet is sent
> > out.
> >
> > What parameters are you using for iperf? I did a lot of testing with
> > iperf3 (and stress-ng running at the same time) and couldn't reproduce the
> > error.
> >
>
> I simply use "iperf -c <server ip>" on ast2600. It is very easy to reproduce. I append the log below:
> Noticed that this issue only happens when HW scatter-gather (NETIF_F_SG) is on.
Ok. This appears to be on by default in the
drivers/net/ethernet/faraday/ftgmac100.c:
netdev->hw_features = NETIF_F_RXCSUM | NETIF_F_HW_CSUM |
NETIF_F_GRO | NETIF_F_SG | NETIF_F_HW_VLAN_CTAG_RX |
NETIF_F_HW_VLAN_CTAG_TX;
> [AST /]$ iperf3 -c 192.168.100.89
> Connecting to host 192.168.100.89, port 5201
> [ 4] local 192.168.100.45 port 45346 connected to 192.168.100.89 port 5201
> [ ID] Interval Transfer Bandwidth Retr Cwnd
> [ 4] 0.00-1.00 sec 44.8 MBytes 375 Mbits/sec 2 1.43 KBytes
> [ 4] 1.00-2.00 sec 0.00 Bytes 0.00 bits/sec 2 1.43 KBytes
> [ 4] 2.00-3.00 sec 0.00 Bytes 0.00 bits/sec 0 1.43 KBytes
> [ 4] 3.00-4.00 sec 0.00 Bytes 0.00 bits/sec 1 1.43 KBytes
> [ 4] 4.00-5.00 sec 0.00 Bytes 0.00 bits/sec 0 1.43 KBytes
> ^C[ 4] 5.00-5.88 sec 0.00 Bytes 0.00 bits/sec 0 1.43 KBytes
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval Transfer Bandwidth Retr
> [ 4] 0.00-5.88 sec 44.8 MBytes 64.0 Mbits/sec 5 sender
> [ 4] 0.00-5.88 sec 0.00 Bytes 0.00 bits/sec receiver
> iperf3: interrupt - the client has terminated
I just realised my test machine must be on a 100Mbit network. I will
try testing on a gigabit network.
> > We could only reproduce it when performing other functions, such as
> > debugging/booting the host processor.
> >
> Could it be another issue?
I hope not! We have deployed your patch on our systems and I will let
you know if we see the bug again.
> > > > > +/*
> > > > > + * test mode control register
> > > > > + */
> > > > > +#define FTGMAC100_TM_RQ_TX_VALID_DIS (1 << 28) #define
> > > > > +FTGMAC100_TM_RQ_RR_IDLE_PREV (1 << 27) #define
> > > > > +FTGMAC100_TM_DEFAULT
> > > > \
> > > > > + (FTGMAC100_TM_RQ_TX_VALID_DIS |
> > > > FTGMAC100_TM_RQ_RR_IDLE_PREV)
> > > >
> > > > Will aspeed issue an updated datasheet with this register documented?
> >
> > Did you see this question?
> >
> Sorry, I missed this question. Aspeed will update the datasheet accordingly.
Thank you.
On Wed, 14 Oct 2020 14:06:32 +0800 Dylan Hung wrote:
> The new HW arbitration feature on Aspeed ast2600 will cause MAC TX to
> hang when handling scatter-gather DMA. Disable the problematic feature
> by setting MAC register 0x58 bit28 and bit27.
>
> Signed-off-by: Dylan Hung <[email protected]>
Applied, thank you.