LinuxLists.cc - dsa_master_find_slave()'s time complexity and potential performance hit

2021-03-04 05:49:09

Subject: dsa_master_find_slave()'s time complexity and potential performance hit

Since commit 7b9a2f4bac68 ("net: dsa: use ports list to find slave"),
dsa_master_find_slave() has been iterating over a linked list instead
of accessing arrays, making its time complexity O(n).
The said function is called frequently in DSA RX path, so it may cause
a performance hit, especially for switches that have many ports (20+)
such as RTL8380/8390/9300 (There is a downstream DSA driver for it,
see https://github.com/openwrt/openwrt/tree/openwrt-21.02/target/linux/realtek/files-5.4/drivers/net/dsa/rtl83xx).
I don't have one of those switches, so I can't test if the performance
impact is huge or not.

2021-03-04 06:16:03

by Vladimir Oltean

[permalink] [raw]

Subject: Re: dsa_master_find_slave()'s time complexity and potential performance hit

On Tue, Mar 02, 2021 at 01:51:42PM +0800, DENG Qingfang wrote:
> Since commit 7b9a2f4bac68 ("net: dsa: use ports list to find slave"),
> dsa_master_find_slave() has been iterating over a linked list instead
> of accessing arrays, making its time complexity O(n).
> The said function is called frequently in DSA RX path, so it may cause
> a performance hit, especially for switches that have many ports (20+)
> such as RTL8380/8390/9300 (There is a downstream DSA driver for it,
> see https://github.com/openwrt/openwrt/tree/openwrt-21.02/target/linux/realtek/files-5.4/drivers/net/dsa/rtl83xx).
> I don't have one of those switches, so I can't test if the performance
> impact is huge or not.

You actually can test that, you could create a tagger in mainline based
on the rtl83xx tagger from downstream, and then you could modify
dsa_loop to use DSA_TAG_PROTO_RTL83XX.

Then you can craft some packets and inject them into the port on which
dsa_loop is attached using tcpreplay.
What I do is:
- I initially send some packets using the xmit function of the tagger,
just to have an initial template to start with. This assumes that the
xmit format is more or less similar to the rcv format.
- capture those xmit packets using tcpdump -i eth0 -Q out -w tagger.pcap
- then open tagger-xmit.pcap in wireshark, run Export Specified Packet
and save it in the K12 text file format
- edit the tagger-xmit.txt file according to my liking, in this case you
would have to create a receive packet on port 19 (the one where it's
most expensive to do the linear lookup of the ports list)
- import the tagger.txt file again in Wireshark and save it as a new
tagger-rcv.pcap
- run tcpreplay on that pcap file in a loop

I would probably go with a very small packet size (64 bytes), and enable
IP routing between two DSA interfaces lan0 and lan1:

ip link set lan0 address de:ad:be:ef:00:00
ip link set lan1 address de:ad:be:ef:00:01
ip addr add 192.168.100.2/24 dev lan0
ip addr add 192.168.101.2/24 dev lan1
echo 1 > /proc/sys/net/ipv4/ip_forward
arp -s 192.168.100.1 00:01:02:03:04:05 dev lan0 # towards spoofed sender
arp -s 192.168.200.1 00:01:02:03:04:06 dev lan1 # towards spoofed receiver

I would make sure the test packet from tagger-rcv.pcap has:
- a source MAC address corresponding to your spoofed sender (in my
example 00:01:02:03:04:05).
- a source IP address corresponding to your spoofed sender (in my
example 192.168.100.1)
- a destination MAC address corresponding to the lan0 interface
(de:ad:be:ef:00:00)
- a destination IP address corresponding to the spoofed receiver
(192.168.101.2)

Then the network stack should route the received packet on lan0 by
replacing the destination MAC address with that of the spoofed receiver
(00:01:02:03:04:06), decrement the IP TTL to 63 and send it through lan1
according to the routing table.

To make sure your throughput is consistent you can do some things such
as add a static flow steering rule on the DSA master to ensure the
packets from the same flow are affine to the same CPU, and that if you
send bidirectional traffic, it gets load balanced across multiple CPUs:

ethtool --config-nfc eth0 flow-type ether dst de:ad:be:ef:00:00 m ff:ff:ff:ff:ff:ff action 0
ethtool --config-nfc eth0 flow-type ether dst de:ad:be:ef:00:01 m ff:ff:ff:ff:ff:ff action 1

Also, you should probably turn off GRO since it's not useful with IP
forwarding and it takes a lot of time to do the re-segmentation on TX,
to recalculate the checksums and all.

ethtool -K lan0 gro off
ethtool -K lan1 gro off

You could probably adjust things a bit, like for example see if the rcv
throughput on lan19 is higher than the throughput on lan0.

That should give you a baseline. Only then would I start hacking at
dsa_master_find_slave and see what benefit it brings to replace the list
lookup with something of fixed temporal complexity, such as a linear
array or something.

I'm curious what you come up with.