2021-05-11 19:07:05

by Gatis Peisenieks

[permalink] [raw]
Subject: [PATCH net-next 0/4] atl1c: support for Mikrotik 10/25G NIC features

The new Mikrotik 10/25G NIC maintains compatibility with existing atl1c
driver. However it does have new features.

This patch set adds support for reporting cards higher link speed, max-mtu,
enables rx csum offload and improves tx performance.

Gatis Peisenieks (4):
atl1c: show correct link speed on Mikrotik 10/25G NIC
atl1c: improve performance by avoiding unnecessary pcie writes on xmit
atl1c: adjust max mtu according to Mikrotik 10/25G NIC ability
atl1c: enable rx csum offload on Mikrotik 10/25G NIC

drivers/net/ethernet/atheros/atl1c/atl1c.h | 3 ++
drivers/net/ethernet/atheros/atl1c/atl1c_hw.c | 9 +++++
drivers/net/ethernet/atheros/atl1c/atl1c_hw.h | 7 ++++
.../net/ethernet/atheros/atl1c/atl1c_main.c | 33 +++++++++++++++----
4 files changed, 46 insertions(+), 6 deletions(-)


base-commit: 3913ba732e972d88ebc391323999e780a9295852
--
2.31.1


2021-05-11 19:07:13

by Gatis Peisenieks

[permalink] [raw]
Subject: [PATCH net-next 1/4] atl1c: show correct link speed on Mikrotik 10/25G NIC

The new Mikrotik 10/25G NIC maintains compatibility with existing atl1c
driver. However it does have new features.

This defines some new register offsets, code for identifying the new type
of NIC and correct speed detection for the NIC.

Signed-off-by: Gatis Peisenieks <[email protected]>
---
drivers/net/ethernet/atheros/atl1c/atl1c.h | 1 +
drivers/net/ethernet/atheros/atl1c/atl1c_hw.c | 9 +++++++++
drivers/net/ethernet/atheros/atl1c/atl1c_hw.h | 7 +++++++
drivers/net/ethernet/atheros/atl1c/atl1c_main.c | 4 ++++
4 files changed, 21 insertions(+)

diff --git a/drivers/net/ethernet/atheros/atl1c/atl1c.h b/drivers/net/ethernet/atheros/atl1c/atl1c.h
index 28ae5c16831e..3fda7eb3bd69 100644
--- a/drivers/net/ethernet/atheros/atl1c/atl1c.h
+++ b/drivers/net/ethernet/atheros/atl1c/atl1c.h
@@ -289,6 +289,7 @@ enum atl1c_nic_type {
athr_l2c_b2,
athr_l1d,
athr_l1d_2,
+ athr_mt,
};

enum atl1c_trans_queue {
diff --git a/drivers/net/ethernet/atheros/atl1c/atl1c_hw.c b/drivers/net/ethernet/atheros/atl1c/atl1c_hw.c
index 140358dcf61e..ddb9442416cd 100644
--- a/drivers/net/ethernet/atheros/atl1c/atl1c_hw.c
+++ b/drivers/net/ethernet/atheros/atl1c/atl1c_hw.c
@@ -648,6 +648,15 @@ int atl1c_get_speed_and_duplex(struct atl1c_hw *hw, u16 *speed, u16 *duplex)
int err;
u16 phy_data;

+ if (hw->nic_type == athr_mt) {
+ u32 spd;
+
+ AT_READ_REG(hw, REG_MT_SPEED, &spd);
+ *speed = spd;
+ *duplex = FULL_DUPLEX;
+ return 0;
+ }
+
/* Read PHY Specific Status Register (17) */
err = atl1c_read_phy_reg(hw, MII_GIGA_PSSR, &phy_data);
if (err)
diff --git a/drivers/net/ethernet/atheros/atl1c/atl1c_hw.h b/drivers/net/ethernet/atheros/atl1c/atl1c_hw.h
index ce1a123dce2c..73cbc049a63e 100644
--- a/drivers/net/ethernet/atheros/atl1c/atl1c_hw.h
+++ b/drivers/net/ethernet/atheros/atl1c/atl1c_hw.h
@@ -764,6 +764,13 @@ void atl1c_post_phy_linkchg(struct atl1c_hw *hw, u16 link_speed);
#define REG_DEBUG_DATA0 0x1900
#define REG_DEBUG_DATA1 0x1904

+#define REG_MT_MAGIC 0x1F00
+#define REG_MT_MODE 0x1F04
+#define REG_MT_SPEED 0x1F08
+#define REG_MT_VERSION 0x1F0C
+
+#define MT_MAGIC 0xaabb1234
+
#define L1D_MPW_PHYID1 0xD01C /* V7 */
#define L1D_MPW_PHYID2 0xD01D /* V1-V6 */
#define L1D_MPW_PHYID3 0xD01E /* V8 */
diff --git a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
index c6263cf8d3c0..28c30d5288e4 100644
--- a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
+++ b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
@@ -644,6 +644,7 @@ static int atl1c_alloc_queues(struct atl1c_adapter *adapter)

static void atl1c_set_mac_type(struct atl1c_hw *hw)
{
+ u32 magic;
switch (hw->device_id) {
case PCI_DEVICE_ID_ATTANSIC_L2C:
hw->nic_type = athr_l2c;
@@ -662,6 +663,9 @@ static void atl1c_set_mac_type(struct atl1c_hw *hw)
break;
case PCI_DEVICE_ID_ATHEROS_L1D_2_0:
hw->nic_type = athr_l1d_2;
+ AT_READ_REG(hw, REG_MT_MAGIC, &magic);
+ if (magic == MT_MAGIC)
+ hw->nic_type = athr_mt;
break;
default:
break;
--
2.31.1

2021-05-11 19:08:25

by Gatis Peisenieks

[permalink] [raw]
Subject: [PATCH net-next 2/4] atl1c: improve performance by avoiding unnecessary pcie writes on xmit

The kernel has xmit_more facility that hints the networking driver xmit
path about whether more packets are coming soon. This information can be
used to avoid unnecessary expensive PCIe transaction per tx packet at a
slight increase in latency.

Max TX pps on Mikrotik 10/25G NIC in a Threadripper 3960X system
improved from 1150Kpps to 1700Kpps.

Signed-off-by: Gatis Peisenieks <[email protected]>
---
drivers/net/ethernet/atheros/atl1c/atl1c_main.c | 13 +++++++++----
1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
index 28c30d5288e4..2a8ab51b0ed9 100644
--- a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
+++ b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
@@ -2211,8 +2211,8 @@ static int atl1c_tx_map(struct atl1c_adapter *adapter,
return -1;
}

-static void atl1c_tx_queue(struct atl1c_adapter *adapter, struct sk_buff *skb,
- struct atl1c_tpd_desc *tpd, enum atl1c_trans_queue type)
+static void atl1c_tx_queue(struct atl1c_adapter *adapter,
+ enum atl1c_trans_queue type)
{
struct atl1c_tpd_ring *tpd_ring = &adapter->tpd_ring[type];
u16 reg;
@@ -2238,6 +2238,7 @@ static netdev_tx_t atl1c_xmit_frame(struct sk_buff *skb,

if (atl1c_tpd_avail(adapter, type) < tpd_req) {
/* no enough descriptor, just stop queue */
+ atl1c_tx_queue(adapter, type);
netif_stop_queue(netdev);
return NETDEV_TX_BUSY;
}
@@ -2246,6 +2247,7 @@ static netdev_tx_t atl1c_xmit_frame(struct sk_buff *skb,

/* do TSO and check sum */
if (atl1c_tso_csum(adapter, skb, &tpd, type) != 0) {
+ atl1c_tx_queue(adapter, type);
dev_kfree_skb_any(skb);
return NETDEV_TX_OK;
}
@@ -2270,8 +2272,11 @@ static netdev_tx_t atl1c_xmit_frame(struct sk_buff *skb,
atl1c_tx_rollback(adapter, tpd, type);
dev_kfree_skb_any(skb);
} else {
- netdev_sent_queue(adapter->netdev, skb->len);
- atl1c_tx_queue(adapter, skb, tpd, type);
+ bool more = netdev_xmit_more();
+
+ __netdev_sent_queue(adapter->netdev, skb->len, more);
+ if (!more)
+ atl1c_tx_queue(adapter, type);
}

return NETDEV_TX_OK;
--
2.31.1

2021-05-11 19:08:28

by Gatis Peisenieks

[permalink] [raw]
Subject: [PATCH net-next 3/4] atl1c: adjust max mtu according to Mikrotik 10/25G NIC ability

The new Mikrotik 10/25G NIC supports jumbo frames. Jumbo frames are
supported for TSO as well.

This enables the support for mtu up to 9500 bytes.

Signed-off-by: Gatis Peisenieks <[email protected]>
---
drivers/net/ethernet/atheros/atl1c/atl1c_main.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
index 2a8ab51b0ed9..c34ad1c3a532 100644
--- a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
+++ b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
@@ -478,6 +478,8 @@ static void atl1c_set_rxbufsize(struct atl1c_adapter *adapter,
static netdev_features_t atl1c_fix_features(struct net_device *netdev,
netdev_features_t features)
{
+ struct atl1c_adapter *adapter = netdev_priv(netdev);
+ struct atl1c_hw *hw = &adapter->hw;
/*
* Since there is no support for separate rx/tx vlan accel
* enable/disable make sure tx flag is always in same state as rx.
@@ -487,8 +489,10 @@ static netdev_features_t atl1c_fix_features(struct net_device *netdev,
else
features &= ~NETIF_F_HW_VLAN_CTAG_TX;

- if (netdev->mtu > MAX_TSO_FRAME_SIZE)
- features &= ~(NETIF_F_TSO | NETIF_F_TSO6);
+ if (hw->nic_type != athr_mt) {
+ if (netdev->mtu > MAX_TSO_FRAME_SIZE)
+ features &= ~(NETIF_F_TSO | NETIF_F_TSO6);
+ }

return features;
}
@@ -517,6 +521,9 @@ static void atl1c_set_max_mtu(struct net_device *netdev)
netdev->max_mtu = MAX_JUMBO_FRAME_SIZE -
(ETH_HLEN + ETH_FCS_LEN + VLAN_HLEN);
break;
+ case athr_mt:
+ netdev->max_mtu = 9500;
+ break;
/* The 10/100 devices don't support jumbo packets, max_mtu 1500 */
default:
netdev->max_mtu = ETH_DATA_LEN;
--
2.31.1

2021-05-11 21:41:15

by Eric Dumazet

[permalink] [raw]
Subject: Re: [PATCH net-next 2/4] atl1c: improve performance by avoiding unnecessary pcie writes on xmit



On 5/11/21 9:05 PM, Gatis Peisenieks wrote:
> The kernel has xmit_more facility that hints the networking driver xmit
> path about whether more packets are coming soon. This information can be
> used to avoid unnecessary expensive PCIe transaction per tx packet at a
> slight increase in latency.
>
> Max TX pps on Mikrotik 10/25G NIC in a Threadripper 3960X system
> improved from 1150Kpps to 1700Kpps.
>
> Signed-off-by: Gatis Peisenieks <[email protected]>
> ---
> drivers/net/ethernet/atheros/atl1c/atl1c_main.c | 13 +++++++++----
> 1 file changed, 9 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
> index 28c30d5288e4..2a8ab51b0ed9 100644
> --- a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
> +++ b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
> @@ -2211,8 +2211,8 @@ static int atl1c_tx_map(struct atl1c_adapter *adapter,
> return -1;
> }
>
> -static void atl1c_tx_queue(struct atl1c_adapter *adapter, struct sk_buff *skb,
> - struct atl1c_tpd_desc *tpd, enum atl1c_trans_queue type)
> +static void atl1c_tx_queue(struct atl1c_adapter *adapter,
> + enum atl1c_trans_queue type)
> {
> struct atl1c_tpd_ring *tpd_ring = &adapter->tpd_ring[type];
> u16 reg;
> @@ -2238,6 +2238,7 @@ static netdev_tx_t atl1c_xmit_frame(struct sk_buff *skb,
>
> if (atl1c_tpd_avail(adapter, type) < tpd_req) {
> /* no enough descriptor, just stop queue */
> + atl1c_tx_queue(adapter, type);
> netif_stop_queue(netdev);
> return NETDEV_TX_BUSY;
> }
> @@ -2246,6 +2247,7 @@ static netdev_tx_t atl1c_xmit_frame(struct sk_buff *skb,
>
> /* do TSO and check sum */
> if (atl1c_tso_csum(adapter, skb, &tpd, type) != 0) {
> + atl1c_tx_queue(adapter, type);
> dev_kfree_skb_any(skb);
> return NETDEV_TX_OK;
> }
> @@ -2270,8 +2272,11 @@ static netdev_tx_t atl1c_xmit_frame(struct sk_buff *skb,
> atl1c_tx_rollback(adapter, tpd, type);
> dev_kfree_skb_any(skb);
> } else {
> - netdev_sent_queue(adapter->netdev, skb->len);
> - atl1c_tx_queue(adapter, skb, tpd, type);
> + bool more = netdev_xmit_more();
> +
> + __netdev_sent_queue(adapter->netdev, skb->len, more);


This is probably buggy.

You must check and use the return code of this function,
as in :

bool door_bell = __netdev_sent_queue(adapter->netdev, skb->len, netdev_xmit_more());

if (door_bell)
atl1c_tx_queue(adapter, type);


> + if (!more)
> + atl1c_tx_queue(adapter, type);
> }
>
> return NETDEV_TX_OK;
>

2021-05-12 02:48:32

by Chris Snook

[permalink] [raw]
Subject: Re: [PATCH net-next 2/4] atl1c: improve performance by avoiding unnecessary pcie writes on xmit

Increases in latency tend to hurt more on single-queue devices. Has
this been tested on the original gigabit atl1c?

- Chris

On Tue, May 11, 2021 at 12:05 PM Gatis Peisenieks <[email protected]> wrote:
>
> The kernel has xmit_more facility that hints the networking driver xmit
> path about whether more packets are coming soon. This information can be
> used to avoid unnecessary expensive PCIe transaction per tx packet at a
> slight increase in latency.
>
> Max TX pps on Mikrotik 10/25G NIC in a Threadripper 3960X system
> improved from 1150Kpps to 1700Kpps.
>
> Signed-off-by: Gatis Peisenieks <[email protected]>
> ---
> drivers/net/ethernet/atheros/atl1c/atl1c_main.c | 13 +++++++++----
> 1 file changed, 9 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
> index 28c30d5288e4..2a8ab51b0ed9 100644
> --- a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
> +++ b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
> @@ -2211,8 +2211,8 @@ static int atl1c_tx_map(struct atl1c_adapter *adapter,
> return -1;
> }
>
> -static void atl1c_tx_queue(struct atl1c_adapter *adapter, struct sk_buff *skb,
> - struct atl1c_tpd_desc *tpd, enum atl1c_trans_queue type)
> +static void atl1c_tx_queue(struct atl1c_adapter *adapter,
> + enum atl1c_trans_queue type)
> {
> struct atl1c_tpd_ring *tpd_ring = &adapter->tpd_ring[type];
> u16 reg;
> @@ -2238,6 +2238,7 @@ static netdev_tx_t atl1c_xmit_frame(struct sk_buff *skb,
>
> if (atl1c_tpd_avail(adapter, type) < tpd_req) {
> /* no enough descriptor, just stop queue */
> + atl1c_tx_queue(adapter, type);
> netif_stop_queue(netdev);
> return NETDEV_TX_BUSY;
> }
> @@ -2246,6 +2247,7 @@ static netdev_tx_t atl1c_xmit_frame(struct sk_buff *skb,
>
> /* do TSO and check sum */
> if (atl1c_tso_csum(adapter, skb, &tpd, type) != 0) {
> + atl1c_tx_queue(adapter, type);
> dev_kfree_skb_any(skb);
> return NETDEV_TX_OK;
> }
> @@ -2270,8 +2272,11 @@ static netdev_tx_t atl1c_xmit_frame(struct sk_buff *skb,
> atl1c_tx_rollback(adapter, tpd, type);
> dev_kfree_skb_any(skb);
> } else {
> - netdev_sent_queue(adapter->netdev, skb->len);
> - atl1c_tx_queue(adapter, skb, tpd, type);
> + bool more = netdev_xmit_more();
> +
> + __netdev_sent_queue(adapter->netdev, skb->len, more);
> + if (!more)
> + atl1c_tx_queue(adapter, type);
> }
>
> return NETDEV_TX_OK;
> --
> 2.31.1
>

2021-05-12 07:56:38

by Gatis Peisenieks

[permalink] [raw]
Subject: Re: [PATCH net-next 2/4] atl1c: improve performance by avoiding unnecessary pcie writes on xmit



On 2021-05-12 00:39, Eric Dumazet wrote:
>> @@ -2270,8 +2272,11 @@ static netdev_tx_t atl1c_xmit_frame(struct
>> sk_buff *skb,
>> atl1c_tx_rollback(adapter, tpd, type);
>> dev_kfree_skb_any(skb);
>> } else {
>> - netdev_sent_queue(adapter->netdev, skb->len);
>> - atl1c_tx_queue(adapter, skb, tpd, type);
>> + bool more = netdev_xmit_more();
>> +
>> + __netdev_sent_queue(adapter->netdev, skb->len, more);
>
>
> This is probably buggy.
>
> You must check and use the return code of this function,
> as in :
>
> bool door_bell = __netdev_sent_queue(adapter->netdev, skb->len,
> netdev_xmit_more());
>
> if (door_bell)
> atl1c_tx_queue(adapter, type);
>

Eric, thank you for taking your time to look at this!

You are correct, tx queue can get stopped in __netdev_sent_queue
and if there were more packets coming the submit to HW would be
missed / unnecessarily delayed.


>
>> + if (!more)
>> + atl1c_tx_queue(adapter, type);
>> }
>>
>> return NETDEV_TX_OK;
>>

2021-05-12 08:36:30

by David Laight

[permalink] [raw]
Subject: RE: [PATCH net-next 2/4] atl1c: improve performance by avoiding unnecessary pcie writes on xmit

From: Chris Snook <[email protected]>
> Sent: 12 May 2021 03:40
>
> On Tue, May 11, 2021 at 12:05 PM Gatis Peisenieks <[email protected]> wrote:
> >
> > The kernel has xmit_more facility that hints the networking driver xmit
> > path about whether more packets are coming soon. This information can be
> > used to avoid unnecessary expensive PCIe transaction per tx packet at a
> > slight increase in latency.
>
> Increases in latency tend to hurt more on single-queue devices. Has
> this been tested on the original gigabit atl1c?

It probably depends a lot on how expensive it is to 'kick' the mac unit.

A simple (posted) PCIe write when the PCIe host interface is idle (as is likely
when you've just been updating descriptors) is probably noise compared
to the rest of the cost of sending the packet.
(Eric will probably say they measured gains.)

OTOH if you have (as I have on one system) the e1000e driver and some
completely broken 'management interface' hardware which means it can
take a lot of microseconds to write to any MAC register you really
do need to look at netdev_xmit_more() [1].

Unfortunately it doesn't help that much.
netdev_xmit_more() reports the state of the tx queue when the current
skb transmit was passed to the mac driver.
It doesn't report the state of the queue at the time netdev_xmit_more()
is called - so any packets queued while the transmit setup is in
progress don't cause netdev_xmit_more() to return true.
I've traced this happening repeatedly...

[1] If the MI is active MAC writes are broken (may write to the
wrong register), so there is horrid code before each access that
(IIRC) effectively does:
while (mi_active())
mdelay(10);
This is just so broken (interrupts are even enabled).

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

2021-05-12 14:37:59

by Gatis Peisenieks

[permalink] [raw]
Subject: Re: [PATCH net-next 2/4] atl1c: improve performance by avoiding unnecessary pcie writes on xmit



On 2021-05-12 05:39, Chris Snook wrote:
> Increases in latency tend to hurt more on single-queue devices. Has
> this been tested on the original gigabit atl1c?

Thank you Chris, for checking this out!

I did test the atl1c driver with and without this change on actual
AR8151 hardware.

My test system was Intel(R) Core(TM) i7-4790K + RB44Ge.
That is a 4 port 1G AR8151 based card.

I measured latency with external traffic generator with test system
doing L2 forwarding. Receiving traffic on one atl1c interface and
transmitting over another atl1c interface. I had default 1000 packet
pfifo queue configured on the atl1c interfaces.

Max 64 byte packet L2 forward pps rate improved 860K -> 1070K.

Any latency difference at 800Kpps was lost in the noise - with the
particular traffic generator system (a linux based RouterOS
traffic-gen).
I measured average 285us for a 30 second run in both cases. Note that
this includes any traffic generator "internal" latency.

Note that I had to tweak atl1c tx interrupt moderation to get these
numbers. With default tx_imt = 1000 no matter what I get only 500
tx interrupts/sec. Since the tx clean is fast and do not get polled
repeatedly and ring size is 1024 I am limited to ~500Kpps.
tx_imt = 500 dobubles that, I used tx_imt = 200 for this test.

As a side note that still relates to latency discussion on AR8151
hardware what I did find out however is that rx interrupt moderation
timer value has a big effect on latency. Changing rx_imt
from 200 to 10 resulted in considerable improvement from 285us to 41us
average latency as measured by traffic generator. I do not have
enough knowledge of the quirks of all the hardware supported by
the driver to confidently put this in a patch though.

Mikrotik 10/25G NIC has its own interrupt moderation mechanism,
so this is not relevant to that if anyone is interested.


>
> - Chris
>
> On Tue, May 11, 2021 at 12:05 PM Gatis Peisenieks <[email protected]>
> wrote:
>>
>> The kernel has xmit_more facility that hints the networking driver
>> xmit
>> path about whether more packets are coming soon. This information can
>> be
>> used to avoid unnecessary expensive PCIe transaction per tx packet at
>> a
>> slight increase in latency.
>>
>> Max TX pps on Mikrotik 10/25G NIC in a Threadripper 3960X system
>> improved from 1150Kpps to 1700Kpps.
>>