2017-03-23 10:08:37

by Corentin Labbe

[permalink] [raw]
Subject: stmmac: Performance regression after commit aff3d9eff843 "net: stmmac: enable multiple buffers"

Hello

Using next-20170323 produce a huge performance regression on my sunxi boards.
On dwmac-sun8i, iperf goes from 94mbs/s to 37 when sending.

On cubieboard2(dwmac-sunxi), iperf made the kernel flood with "ndesc_get_rx_status: Oversized frame spanned multiple buffers"
and network is lost after.

Reverting aff3d9eff84399e433c4aca65a9bb236581bc082 fix the issue.
I still try to found which part of this patch mades the performance lower.

Regards
Corentin Labbe


2017-03-23 10:12:26

by Joao Pinto

[permalink] [raw]
Subject: Re: stmmac: Performance regression after commit aff3d9eff843 "net: stmmac: enable multiple buffers"


Hi Corentin,

?s 10:08 AM de 3/23/2017, Corentin Labbe escreveu:
> Hello
>
> Using next-20170323 produce a huge performance regression on my sunxi boards.
> On dwmac-sun8i, iperf goes from 94mbs/s to 37 when sending.
>
> On cubieboard2(dwmac-sunxi), iperf made the kernel flood with "ndesc_get_rx_status: Oversized frame spanned multiple buffers"
> and network is lost after.
>
> Reverting aff3d9eff84399e433c4aca65a9bb236581bc082 fix the issue.
> I still try to found which part of this patch mades the performance lower.
>
> Regards
> Corentin Labbe
>

I have a 4.21 QoS Core with 4 RX + 4 TX and detected no regression.
Could you please share the iperf cmds you are using in order for me to reproduce
in my side?

@stmmac users: It would be great if people that have a setup could also perform
teh same iperf test in order to clean in up for everyone.

Thanks,
Joao

2017-03-23 10:20:45

by Corentin Labbe

[permalink] [raw]
Subject: Re: stmmac: Performance regression after commit aff3d9eff843 "net: stmmac: enable multiple buffers"

On Thu, Mar 23, 2017 at 10:12:18AM +0000, Joao Pinto wrote:
>
> Hi Corentin,
>
> ?s 10:08 AM de 3/23/2017, Corentin Labbe escreveu:
> > Hello
> >
> > Using next-20170323 produce a huge performance regression on my sunxi boards.
> > On dwmac-sun8i, iperf goes from 94mbs/s to 37 when sending.
> >
> > On cubieboard2(dwmac-sunxi), iperf made the kernel flood with "ndesc_get_rx_status: Oversized frame spanned multiple buffers"
> > and network is lost after.
> >
> > Reverting aff3d9eff84399e433c4aca65a9bb236581bc082 fix the issue.
> > I still try to found which part of this patch mades the performance lower.
> >
> > Regards
> > Corentin Labbe
> >
>
> I have a 4.21 QoS Core with 4 RX + 4 TX and detected no regression.
> Could you please share the iperf cmds you are using in order for me to reproduce
> in my side?

simple iperf -c serverip for both board

2017-03-23 10:40:50

by Joao Pinto

[permalink] [raw]
Subject: Re: stmmac: Performance regression after commit aff3d9eff843 "net: stmmac: enable multiple buffers"

?s 10:20 AM de 3/23/2017, Corentin Labbe escreveu:
> On Thu, Mar 23, 2017 at 10:12:18AM +0000, Joao Pinto wrote:
>>
>> Hi Corentin,
>>
>> ?s 10:08 AM de 3/23/2017, Corentin Labbe escreveu:
>>> Hello
>>>
>>> Using next-20170323 produce a huge performance regression on my sunxi boards.
>>> On dwmac-sun8i, iperf goes from 94mbs/s to 37 when sending.
>>>
>>> On cubieboard2(dwmac-sunxi), iperf made the kernel flood with "ndesc_get_rx_status: Oversized frame spanned multiple buffers"
>>> and network is lost after.
>>>
>>> Reverting aff3d9eff84399e433c4aca65a9bb236581bc082 fix the issue.
>>> I still try to found which part of this patch mades the performance lower.
>>>
>>> Regards
>>> Corentin Labbe
>>>
>>
>> I have a 4.21 QoS Core with 4 RX + 4 TX and detected no regression.
>> Could you please share the iperf cmds you are using in order for me to reproduce
>> in my side?
>
> simple iperf -c serverip for both board
>

Ok, I am going to run my tests with a fresh net-next and come back to you soon.

Thanks,
Joao

2017-03-23 10:49:10

by Peppe CAVALLARO

[permalink] [raw]
Subject: Re: stmmac: Performance regression after commit aff3d9eff843 "net: stmmac: enable multiple buffers"

Hello

On 3/23/2017 11:20 AM, Corentin Labbe wrote:
>> I have a 4.21 QoS Core with 4 RX + 4 TX and detected no regression.
>> >Could you please share the iperf cmds you are using in order for me to reproduce
>> >in my side?

Joao, you have a really powerful HW integration with multiple channels
for both RX and TX.
Often this is not the same for other setup where, usually just a DMA0 is
present or, sometime, there
is just one RX extra channel.

My question is, what happens on this kind of configurations? Are we
still guarantying the best performances?

Also we have to guarantee, that the TSO and SG are always working.
Another point is the buffer sizes that
can be different among platforms.

The problem below reported by Corentin push me to think that there is a
bug, so we should
understand when this has been introduced and if likely fixed by some
configuration we are
not take care right now.

ndesc_get_rx_status: Oversized frame spanned multiple buffers"


Best Regards
Peppe

2017-03-23 10:51:31

by Peppe CAVALLARO

[permalink] [raw]
Subject: Re: stmmac: Performance regression after commit aff3d9eff843 "net: stmmac: enable multiple buffers"

On 3/23/2017 11:48 AM, Giuseppe CAVALLARO wrote:
> Hello
>
> On 3/23/2017 11:20 AM, Corentin Labbe wrote:
>>> I have a 4.21 QoS Core with 4 RX + 4 TX and detected no regression.
>>> >Could you please share the iperf cmds you are using in order for me
>>> to reproduce
>>> >in my side?
>
> Joao, you have a really powerful HW integration with multiple channels
> for both RX and TX.
> Often this is not the same for other setup where, usually just a DMA0
> is present or, sometime, there
> is just one RX extra channel.
>
> My question is, what happens on this kind of configurations? Are we
> still guarantying the best performances?
>
> Also we have to guarantee, that the TSO and SG are always working.
> Another point is the buffer sizes that
> can be different among platforms.
>
> The problem below reported by Corentin push me to think that there is
> a bug, so we should
> understand when this has been introduced and if likely fixed by some
> configuration we are
> not take care right now.
>
> ndesc_get_rx_status: Oversized frame spanned multiple buffers"

I wonder if this could be easily triggered by getting a big file via
FTP. So not properly related on performance benchs

peppe

>
>
> Best Regards
> Peppe
>

2017-03-23 10:55:10

by Joao Pinto

[permalink] [raw]
Subject: Re: stmmac: Performance regression after commit aff3d9eff843 "net: stmmac: enable multiple buffers"


Hi Peppe,

?s 10:48 AM de 3/23/2017, Giuseppe CAVALLARO escreveu:
> Hello
>
> On 3/23/2017 11:20 AM, Corentin Labbe wrote:
>>> I have a 4.21 QoS Core with 4 RX + 4 TX and detected no regression.
>>> >Could you please share the iperf cmds you are using in order for me to
>>> reproduce
>>> >in my side?
>
> Joao, you have a really powerful HW integration with multiple channels for both
> RX and TX.
> Often this is not the same for other setup where, usually just a DMA0 is present
> or, sometime, there
> is just one RX extra channel.

My opinion is that we should not have problems, since the majority of features
introduced are used if you configure rx queues > 1 or tx queues > 1, so if you
use the default (=1) those confiogurations will not take place.

>
> My question is, what happens on this kind of configurations? Are we still
> guarantying the best performances?
>
> Also we have to guarantee, that the TSO and SG are always working. Another point
> is the buffer sizes that
> can be different among platforms.

We have to pay attention to the RX buffer size, since I had problems with DHCP
messages not being received because of little buffer size.
Currently TX buffer size is not configurable and in the future it should be
useful to include it too.

>
> The problem below reported by Corentin push me to think that there is a bug, so
> we should
> understand when this has been introduced and if likely fixed by some
> configuration we are
> not take care right now.

Of course.

>
> ndesc_get_rx_status: Oversized frame spanned multiple buffers"
>
>
> Best Regards
> Peppe

Thanks,
Joao

2017-03-23 10:59:10

by Joao Pinto

[permalink] [raw]
Subject: Re: stmmac: Performance regression after commit aff3d9eff843 "net: stmmac: enable multiple buffers"

?s 10:51 AM de 3/23/2017, Giuseppe CAVALLARO escreveu:
> On 3/23/2017 11:48 AM, Giuseppe CAVALLARO wrote:
>> Hello
>>
>> On 3/23/2017 11:20 AM, Corentin Labbe wrote:
>>>> I have a 4.21 QoS Core with 4 RX + 4 TX and detected no regression.
>>>> >Could you please share the iperf cmds you are using in order for me to
>>>> reproduce
>>>> >in my side?
>>
>> Joao, you have a really powerful HW integration with multiple channels for
>> both RX and TX.
>> Often this is not the same for other setup where, usually just a DMA0 is
>> present or, sometime, there
>> is just one RX extra channel.
>>
>> My question is, what happens on this kind of configurations? Are we still
>> guarantying the best performances?
>>
>> Also we have to guarantee, that the TSO and SG are always working. Another
>> point is the buffer sizes that
>> can be different among platforms.
>>
>> The problem below reported by Corentin push me to think that there is a bug,
>> so we should
>> understand when this has been introduced and if likely fixed by some
>> configuration we are
>> not take care right now.
>>
>> ndesc_get_rx_status: Oversized frame spanned multiple buffers"
>
> I wonder if this could be easily triggered by getting a big file via FTP. So not
> properly related on performance benchs

I am going to do that test and check it out and also run iperf a couple of
times. I am counting on doing this today and send you later the results. If
anyone gets results sooner please share.

>
> peppe
>
>>
>>
>> Best Regards
>> Peppe
>>
>

Thanks.

2017-03-23 12:56:32

by Joao Pinto

[permalink] [raw]
Subject: Re: stmmac: Performance regression after commit aff3d9eff843 "net: stmmac: enable multiple buffers"

?s 10:56 AM de 3/23/2017, Joao Pinto escreveu:
> ?s 10:51 AM de 3/23/2017, Giuseppe CAVALLARO escreveu:
>> On 3/23/2017 11:48 AM, Giuseppe CAVALLARO wrote:
>>> Hello
>>>
>>> On 3/23/2017 11:20 AM, Corentin Labbe wrote:
>>>>> I have a 4.21 QoS Core with 4 RX + 4 TX and detected no regression.
>>>>>> Could you please share the iperf cmds you are using in order for me to
>>>>> reproduce
>>>>>> in my side?
>>>

HW Version: 4.21 QoS Core in HAPS DX7 (FPGA)
The connection between the FPGA and PC where stmmac is running is PCIe.
My configurations are done in stmmac_pci. Here they are:

@@ -68,10 +70,52 @@ static void stmmac_default_data(struct plat_stmmacenet_data
*plat)
{
plat->bus_id = 1;
plat->phy_addr = 0;
- plat->interface = PHY_INTERFACE_MODE_GMII;
- plat->clk_csr = 2; /* clk_csr_i = 20-35MHz & MDC = clk_csr_i/16 */
- plat->has_gmac = 1;
- plat->force_sf_dma_mode = 1;
+ plat->interface = PHY_INTERFACE_MODE_SGMII;
+ plat->clk_csr = 0x5;
+ plat->has_gmac = 0;
+ plat->has_gmac4 = 1;
+ plat->force_sf_dma_mode = 0;
+
+ plat->rx_queues_to_use = 4;
+ plat->tx_queues_to_use = 4;
+
+ plat->rx_sched_algorithm = MTL_RX_ALGORITHM_SP;
+
+ plat->rx_queues_cfg[0].mode_to_use = MTL_QUEUE_AVB;
+ plat->rx_queues_cfg[1].mode_to_use = MTL_QUEUE_DCB;
+ plat->rx_queues_cfg[2].mode_to_use = MTL_QUEUE_DCB;
+ plat->rx_queues_cfg[3].mode_to_use = MTL_QUEUE_DCB;
+
+ plat->tx_queues_cfg[0].mode_to_use = MTL_QUEUE_DCB;
+ plat->tx_queues_cfg[1].mode_to_use = MTL_QUEUE_AVB;
+ plat->tx_queues_cfg[2].mode_to_use = MTL_QUEUE_DCB;
+ plat->tx_queues_cfg[3].mode_to_use = MTL_QUEUE_DCB;
+
+ plat->tx_queues_cfg[1].send_slope = 0xCCC;
+ plat->tx_queues_cfg[1].idle_slope = 0x1333;
+ plat->tx_queues_cfg[1].high_credit = 0x4B0000;
+ plat->tx_queues_cfg[1].low_credit = 0xFFB50000;
+
+ plat->rx_queues_cfg[0].chan = 0;
+ plat->rx_queues_cfg[1].chan = 1;
+ plat->rx_queues_cfg[2].chan = 2;
+ plat->rx_queues_cfg[3].chan = 3;
+
+ plat->tx_sched_algorithm = MTL_TX_ALGORITHM_WRR;
+ plat->tx_queues_cfg[0].weight = 0x10;
+ plat->tx_queues_cfg[1].weight = 0x11;
+ plat->tx_queues_cfg[2].weight = 0x12;
+ plat->tx_queues_cfg[3].weight = 0x13;
+
+ /* Disable Priority config by default */
+ plat->tx_queues_cfg[0].use_prio = false;
+ plat->rx_queues_cfg[0].use_prio = false;
+
+ /* Disable RX queues routing by default */
+ plat->rx_queues_cfg[0].pkt_route = 0x0;
+ plat->rx_queues_cfg[1].pkt_route = 0x0;
+ plat->rx_queues_cfg[2].pkt_route = 0x0;
+ plat->rx_queues_cfg[3].pkt_route = 0x0;

plat->mdio_bus_data->phy_reset = NULL;
plat->mdio_bus_data->phy_mask = 0;
@@ -83,22 +127,14 @@ static void stmmac_default_data(struct plat_stmmacenet_data
*plat)
/* Set default value for multicast hash bins */
plat->multicast_filter_bins = HASH_TABLE_SIZE;

+ plat->dma_cfg->fixed_burst = 0;
+ plat->dma_cfg->aal = 0;
+
/* Set default value for unicast filter entries */
plat->unicast_filter_entries = 1;

/* Set the maxmtu to a default of JUMBO_LEN */
plat->maxmtu = JUMBO_LEN;
-
- /* Set default number of RX and TX queues to use */
- plat->tx_queues_to_use = 1;
- plat->rx_queues_to_use = 1;
-
- /* Disable Priority config by default */
- plat->tx_queues_cfg[0].use_prio = false;
- plat->rx_queues_cfg[0].use_prio = false;
-
- /* Disable RX queues routing by default */
- plat->rx_queues_cfg[0].pkt_route = 0x0;
}


******* TESTS *******


*TEST 1: File (linux-next tarball) transfer of ~1.4G by scp to the DUT*

scp net-next-20170323.tar.gz xxxxx@XXXXXXX:/home/synopsys/
The authenticity of host 'XXXXX' can't be established.
ECDSA key fingerprint is SHA256:/XXXXXX.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'XXXXXX' (ECDSA) to the list of known hosts.
XXXXXX@XXXXX's password:
net-next20170323.tar.gz

100% 1366MB 79.3MB/s 00:17

ifconfig after transfer:

eth1 Link encap:Ethernet HWaddr XXXX
inet addr:XXXX Bcast:XXXX Mask:XXXX
inet6 addr: XXXXX Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:1026614 errors:0 dropped:0 overruns:0 frame:0
TX packets:56804 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:1502856063 (1.5 GB) TX bytes:4224767 (4.2 MB)
Interrupt:16

*stmmac Log after transfer:

#:~/temp$ dmesg | grep stmmac
[ 0.278200] stmmac - user ID: 0x10, Synopsys ID: 0x42
[ 0.278207] stmmaceth 0000:01:00.0: DMA HW capability register supported
[ 0.278209] stmmaceth 0000:01:00.0: RX Checksum Offload Engine supported
[ 0.278211] stmmaceth 0000:01:00.0: TX Checksum insertion supported
[ 0.278224] stmmaceth 0000:01:00.0: Enable RX Mitigation via HW Watchdog Timer
[ 0.315596] libphy: stmmac: probed
[ 0.315601] stmmaceth 0000:01:00.0 (unnamed net_device) (uninitialized): PHY
ID 01410cc2 at 0 IRQ POLL (stmmac-1:00) active
[ 0.315605] stmmaceth 0000:01:00.0 (unnamed net_device) (uninitialized): PHY
ID 01410cc2 at 1 IRQ POLL (stmmac-1:01)
[ 0.315608] stmmaceth 0000:01:00.0 (unnamed net_device) (uninitialized): PHY
ID 01410cc2 at 2 IRQ POLL (stmmac-1:02)
[ 0.315612] stmmaceth 0000:01:00.0 (unnamed net_device) (uninitialized): PHY
ID 01410cc2 at 3 IRQ POLL (stmmac-1:03)
[ 13.380009] Generic PHY stmmac-1:00: attached PHY driver [Generic PHY]
(mii_bus:phy_addr=stmmac-1:00, irq=-1)
[ 13.390093] stmmaceth 0000:01:00.0 eth1: IEEE 1588-2008 Advanced Timestamp
supported
[ 13.390200] stmmaceth 0000:01:00.0 eth1: registered PTP clock
[ 14.436743] stmmaceth 0000:01:00.0 eth1: Link is Up - 1Gbps/Full - flow
control off
[ 21.056476] stmmac_set_wol+0x55/0xc0

Conclusions: No packets lost, clean stmmac log.


* TEST 2: iperf

Server side:

#:/media/DevDisk/gitrepo/mainline-net$ iperf -s -B XXXX.0.3
------------------------------------------------------------
Server listening on TCP port 5001
Binding to local address XXXXX.0.3
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
[ 4] local XXXX.0.3 port 5001 connected with XXXX.0.2 port 54092
[ ID] Interval Transfer Bandwidth
[ 4] 0.0-20.1 sec 1.03 GBytes 443 Mbits/sec

Client side:

#:~/temp$ iperf -c XXXX.0.3 --port 5001 -t 20 -i 5
------------------------------------------------------------
Client connecting to XXXX.0.3, TCP port 5001
TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[ 3] local XXXXX.0.2 port 54092 connected with XXXXX.0.3 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0- 5.0 sec 265 MBytes 445 Mbits/sec
[ 3] 5.0-10.0 sec 265 MBytes 444 Mbits/sec
[ 3] 10.0-15.0 sec 264 MBytes 444 Mbits/sec
[ 3] 15.0-20.0 sec 263 MBytes 442 Mbits/sec
[ 3] 0.0-20.0 sec 1.03 GBytes 444 Mbits/sec


Thanks,
Joao