2024-02-28 06:43:48

by Abdul Anshad Azeez

[permalink] [raw]
Subject: Network performance regression in Linux kernel 6.6 for small socket size test cases

During performance regression workload execution of the Linux
kernel we observed up to 30% performance decrease in a specific networking
workload on the 6.6 kernel compared to 6.5 (details below). The regression is
reproducible in both Linux VMs running on ESXi and bare metal Linux.

Workload details:

Benchmark - Netperf TCP_STREAM
Socket buffer size - 8K
Message size - 256B
MTU - 1500B
Socket option - TCP_NODELAY
# of STREAMs - 32
Direction - Uni-Directional Receive
Duration - 60 Seconds
NIC - Mellanox Technologies ConnectX-6 Dx EN 100G
Server Config - Intel(R) Xeon(R) Gold 6348 CPU @ 2.60GHz & 512G Memory

Bisect between 6.5 and 6.6 kernel concluded that this regression originated
from the below commit:

commit - dfa2f0483360d4d6f2324405464c9f281156bd87 (tcp: get rid of
sysctl_tcp_adv_win_scale)
Author - Eric Dumazet <[email protected]>
Link -
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=
dfa2f0483360d4d6f2324405464c9f281156bd87

Performance data for (Linux VM on ESXi):
Test case - TCP_STREAM_RECV Throughput in Gbps
(for different socket buffer sizes and with constant message size - 256B):

Socket buffer size - [LK6.5 vs LK6.6]
8K - [8.4 vs 5.9 Gbps]
16K - [13.4 vs 10.6 Gbps]
32K - [19.1 vs 16.3 Gbps]
64K - [19.6 vs 19.7 Gbps]
Autotune - [19.7 vs 19.6 Gbps]

From the above performance data, we can infer that:
* Regression is specific to lower fixed socket buffer sizes (8K, 16K & 32K).
* Increasing the socket buffer size gradually decreases the throughput impact.
* Performance is equal for higher fixed socket size (64K) and Autotune socket
tests.

We would like to know if there are any opportunities for optimization in
the test cases with small socket sizes.

Abdul Anshad Azeez
Performance Engineering
Broadcom Inc.

--
This electronic communication and the information and any files transmitted
with it, or attached to it, are confidential and are intended solely for
the use of the individual or entity to whom it is addressed and may contain
information that is confidential, legally privileged, protected by privacy
laws, or otherwise restricted from disclosure to anyone else. If you are
not the intended recipient or the person responsible for delivering the
e-mail to the intended recipient, you are hereby notified that any use,
copying, distributing, dissemination, forwarding, printing, or copying of
this e-mail is strictly prohibited. If you received this e-mail in error,
please return the e-mail to the sender, delete it from your computer, and
destroy any printed copy of it.


2024-02-28 08:33:12

by Bagas Sanjaya

[permalink] [raw]
Subject: Re: Network performance regression in Linux kernel 6.6 for small socket size test cases

[also Cc: regressions ML]

On Wed, Feb 28, 2024 at 12:13:27PM +0530, Abdul Anshad Azeez wrote:
> During performance regression workload execution of the Linux
> kernel we observed up to 30% performance decrease in a specific networking
> workload on the 6.6 kernel compared to 6.5 (details below). The regression is
> reproducible in both Linux VMs running on ESXi and bare metal Linux.
>
> Workload details:
>
> Benchmark - Netperf TCP_STREAM
> Socket buffer size - 8K
> Message size - 256B
> MTU - 1500B
> Socket option - TCP_NODELAY
> # of STREAMs - 32
> Direction - Uni-Directional Receive
> Duration - 60 Seconds
> NIC - Mellanox Technologies ConnectX-6 Dx EN 100G
> Server Config - Intel(R) Xeon(R) Gold 6348 CPU @ 2.60GHz & 512G Memory
>
> Bisect between 6.5 and 6.6 kernel concluded that this regression originated
> from the below commit:
>
> commit - dfa2f0483360d4d6f2324405464c9f281156bd87 (tcp: get rid of
> sysctl_tcp_adv_win_scale)
> Author - Eric Dumazet <[email protected]>
> Link -
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=
> dfa2f0483360d4d6f2324405464c9f281156bd87
>
> Performance data for (Linux VM on ESXi):
> Test case - TCP_STREAM_RECV Throughput in Gbps
> (for different socket buffer sizes and with constant message size - 256B):
>
> Socket buffer size - [LK6.5 vs LK6.6]
> 8K - [8.4 vs 5.9 Gbps]
> 16K - [13.4 vs 10.6 Gbps]
> 32K - [19.1 vs 16.3 Gbps]
> 64K - [19.6 vs 19.7 Gbps]
> Autotune - [19.7 vs 19.6 Gbps]
>
> >From the above performance data, we can infer that:
> * Regression is specific to lower fixed socket buffer sizes (8K, 16K & 32K).
> * Increasing the socket buffer size gradually decreases the throughput impact.
> * Performance is equal for higher fixed socket size (64K) and Autotune socket
> tests.
>
> We would like to know if there are any opportunities for optimization in
> the test cases with small socket sizes.
>

Can you verify the regression on current mainline (v6.8-rc6)?

--
An old man doll... just what I always wanted! - Clara


Attachments:
(No filename) (2.05 kB)
signature.asc (235.00 B)
Download all attachments

2024-02-28 08:48:35

by Eric Dumazet

[permalink] [raw]
Subject: Re: Network performance regression in Linux kernel 6.6 for small socket size test cases

On Wed, Feb 28, 2024 at 7:43 AM Abdul Anshad Azeez
<[email protected]> wrote:
>
> During performance regression workload execution of the Linux
> kernel we observed up to 30% performance decrease in a specific networking
> workload on the 6.6 kernel compared to 6.5 (details below). The regression is
> reproducible in both Linux VMs running on ESXi and bare metal Linux.
>
> Workload details:
>
> Benchmark - Netperf TCP_STREAM
> Socket buffer size - 8K
> Message size - 256B
> MTU - 1500B
> Socket option - TCP_NODELAY
> # of STREAMs - 32
> Direction - Uni-Directional Receive
> Duration - 60 Seconds
> NIC - Mellanox Technologies ConnectX-6 Dx EN 100G
> Server Config - Intel(R) Xeon(R) Gold 6348 CPU @ 2.60GHz & 512G Memory
>
> Bisect between 6.5 and 6.6 kernel concluded that this regression originated
> from the below commit:
>
> commit - dfa2f0483360d4d6f2324405464c9f281156bd87 (tcp: get rid of
> sysctl_tcp_adv_win_scale)
> Author - Eric Dumazet <[email protected]>
> Link -
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=
> dfa2f0483360d4d6f2324405464c9f281156bd87
>
> Performance data for (Linux VM on ESXi):
> Test case - TCP_STREAM_RECV Throughput in Gbps
> (for different socket buffer sizes and with constant message size - 256B):
>
> Socket buffer size - [LK6.5 vs LK6.6]
> 8K - [8.4 vs 5.9 Gbps]
> 16K - [13.4 vs 10.6 Gbps]
> 32K - [19.1 vs 16.3 Gbps]
> 64K - [19.6 vs 19.7 Gbps]
> Autotune - [19.7 vs 19.6 Gbps]
>
> From the above performance data, we can infer that:
> * Regression is specific to lower fixed socket buffer sizes (8K, 16K & 32K).
> * Increasing the socket buffer size gradually decreases the throughput impact.
> * Performance is equal for higher fixed socket size (64K) and Autotune socket
> tests.
>
> We would like to know if there are any opportunities for optimization in
> the test cases with small socket sizes.
>

Sure, I would suggest not setting small SO_RCVBUF values in 2024,
or you get what you ask for (going back to old TCP performance of year 2010 )

Back in 2018, we set tcp_rmem[1] to 131072 for a good reason.

commit a337531b942bd8a03e7052444d7e36972aac2d92
Author: Yuchung Cheng <[email protected]>
Date: Thu Sep 27 11:21:19 2018 -0700

tcp: up initial rmem to 128KB and SYN rwin to around 64KB


I can not enforce a minimum in SO_RCVBUF (other than the small one added in
commit eea86af6b1e18d6fa8dc959e3ddc0100f27aff9f ("net: sock: adapt
SOCK_MIN_RCVBUF and SOCK_MIN_SNDBUF"))
otherwise many test programs will break, expecting to set a low value.

Subject: Re: Network performance regression in Linux kernel 6.6 for small socket size test cases

On 28.02.24 09:32, Bagas Sanjaya wrote:
> [also Cc: regressions ML]
>
> On Wed, Feb 28, 2024 at 12:13:27PM +0530, Abdul Anshad Azeez wrote:
>> During performance regression workload execution of the Linux
>> kernel we observed up to 30% performance decrease in a specific networking
>> workload on the 6.6 kernel compared to 6.5 (details below). The regression is
>> reproducible in both Linux VMs running on ESXi and bare metal Linux.
>>
>> [...]
>>
>> We would like to know if there are any opportunities for optimization in
>> the test cases with small socket sizes.
>
> Can you verify the regression on current mainline (v6.8-rc6)?

Bagas, I know that you are trying to help, but this is not helpful at
all (and indirectly puts regression tracking and the kernel development
community into a bad light).

Asking that question can be the right thing sometimes, for example in a
bugzilla ticket where the reporter is clearly reporting their first bug.
But the quoted report above clearly does not fall into that category for
various obvious reasons.

If you want to ensure that reports like that are acted upon, wait at
least two or three work days and see if there is a reply from a
developer. In case there is none (which happens, but I assume for a bug
report like this is likely rare) prodding a bit can be okay. But even
then you definitely want to use a more friendly tone. Maybe something
like "None of the developers reacted yet; maybe none of them bothered to
take a closer look because it's unclear if the problem still happens
with the latest code. You thus might want to verify and report back if
the problem happens with latest mainline, maybe then someone will take a
closer look".

Okay, that has way too many "maybe" in it, but I'm sure you'll get the
idea. :-D

Ciao, Thorsten




2024-02-28 12:02:25

by Bagas Sanjaya

[permalink] [raw]
Subject: Re: Network performance regression in Linux kernel 6.6 for small socket size test cases

On 2/28/24 16:09, Linux regression tracking (Thorsten Leemhuis) wrote:
> On 28.02.24 09:32, Bagas Sanjaya wrote:
>> [also Cc: regressions ML]
>>
>> On Wed, Feb 28, 2024 at 12:13:27PM +0530, Abdul Anshad Azeez wrote:
>>> During performance regression workload execution of the Linux
>>> kernel we observed up to 30% performance decrease in a specific networking
>>> workload on the 6.6 kernel compared to 6.5 (details below). The regression is
>>> reproducible in both Linux VMs running on ESXi and bare metal Linux.
>>>
>>> [...]
>>>
>>> We would like to know if there are any opportunities for optimization in
>>> the test cases with small socket sizes.
>>
>> Can you verify the regression on current mainline (v6.8-rc6)?
>
> Bagas, I know that you are trying to help, but this is not helpful at
> all (and indirectly puts regression tracking and the kernel development
> community into a bad light).
>
> Asking that question can be the right thing sometimes, for example in a
> bugzilla ticket where the reporter is clearly reporting their first bug.
> But the quoted report above clearly does not fall into that category for
> various obvious reasons.
>
> If you want to ensure that reports like that are acted upon, wait at
> least two or three work days and see if there is a reply from a
> developer. In case there is none (which happens, but I assume for a bug
> report like this is likely rare) prodding a bit can be okay. But even
> then you definitely want to use a more friendly tone. Maybe something
> like "None of the developers reacted yet; maybe none of them bothered to
> take a closer look because it's unclear if the problem still happens
> with the latest code. You thus might want to verify and report back if
> the problem happens with latest mainline, maybe then someone will take a
> closer look".
>
> Okay, that has way too many "maybe" in it, but I'm sure you'll get the
> idea. :-D
>

Oops, I'm always impatient (and forgot to privately mail you) in this case.
Sorry for inconvenience.

--
An old man doll... just what I always wanted! - Clara


2024-03-06 12:52:05

by Eric Dumazet

[permalink] [raw]
Subject: Re: Network performance regression in Linux kernel 6.6 for small socket size test cases

On Wed, Mar 6, 2024 at 1:43 PM Boon Ang <[email protected]> wrote:
>
> Hello Eric,
>
> The choice of socket buffer size is something that an application can decide and there many be reasons to keep to smaller sizes. While high bandwidth transfers obviously should use larger sizes, a change that regresses the performance of existing configuration is a regression. Is there any way to modify your change so that it keeps the benefits while avoiding the degradation for small socket sizes?
>


The kernel limits the amount of memory used by the receive queue.

The problem is that for XXX bytes of payload (what the user application wants),
the metadata overhead is not fixed.

Kernel structures change over time, and packets are not always full
from the remote peer (that we can not control)

1000 bytes of payload might fit in 2KB, or 2MB depending on how the
bytes are spread over multiple skbs.

This issue has been there forever, the kernel can not put in stone any rule :

XXXX bytes of payload ---> YYYY bytes of kernel memory to hold XXXX
bytes of payload.

It is time that applications setting tiny SO_RCVBUF values get what they want :

Poor TCP performance.

Thanks.

> Thanks
> Boon
>
> On Wed, Feb 28, 2024 at 12:48 AM Eric Dumazet <[email protected]> wrote:
>>
>> On Wed, Feb 28, 2024 at 7:43 AM Abdul Anshad Azeez
>> <[email protected]> wrote:
>> >
>> > During performance regression workload execution of the Linux
>> > kernel we observed up to 30% performance decrease in a specific networking
>> > workload on the 6.6 kernel compared to 6.5 (details below). The regression is
>> > reproducible in both Linux VMs running on ESXi and bare metal Linux.
>> >
>> > Workload details:
>> >
>> > Benchmark - Netperf TCP_STREAM
>> > Socket buffer size - 8K
>> > Message size - 256B
>> > MTU - 1500B
>> > Socket option - TCP_NODELAY
>> > # of STREAMs - 32
>> > Direction - Uni-Directional Receive
>> > Duration - 60 Seconds
>> > NIC - Mellanox Technologies ConnectX-6 Dx EN 100G
>> > Server Config - Intel(R) Xeon(R) Gold 6348 CPU @ 2.60GHz & 512G Memory
>> >
>> > Bisect between 6.5 and 6.6 kernel concluded that this regression originated
>> > from the below commit:
>> >
>> > commit - dfa2f0483360d4d6f2324405464c9f281156bd87 (tcp: get rid of
>> > sysctl_tcp_adv_win_scale)
>> > Author - Eric Dumazet <[email protected]>
>> > Link -
>> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=
>> > dfa2f0483360d4d6f2324405464c9f281156bd87
>> >
>> > Performance data for (Linux VM on ESXi):
>> > Test case - TCP_STREAM_RECV Throughput in Gbps
>> > (for different socket buffer sizes and with constant message size - 256B):
>> >
>> > Socket buffer size - [LK6.5 vs LK6.6]
>> > 8K - [8.4 vs 5.9 Gbps]
>> > 16K - [13.4 vs 10.6 Gbps]
>> > 32K - [19.1 vs 16.3 Gbps]
>> > 64K - [19.6 vs 19.7 Gbps]
>> > Autotune - [19.7 vs 19.6 Gbps]
>> >
>> > From the above performance data, we can infer that:
>> > * Regression is specific to lower fixed socket buffer sizes (8K, 16K & 32K).
>> > * Increasing the socket buffer size gradually decreases the throughput impact.
>> > * Performance is equal for higher fixed socket size (64K) and Autotune socket
>> > tests.
>> >
>> > We would like to know if there are any opportunities for optimization in
>> > the test cases with small socket sizes.
>> >
>>
>> Sure, I would suggest not setting small SO_RCVBUF values in 2024,
>> or you get what you ask for (going back to old TCP performance of year 2010 )
>>
>> Back in 2018, we set tcp_rmem[1] to 131072 for a good reason.
>>
>> commit a337531b942bd8a03e7052444d7e36972aac2d92
>> Author: Yuchung Cheng <[email protected]>
>> Date: Thu Sep 27 11:21:19 2018 -0700
>>
>> tcp: up initial rmem to 128KB and SYN rwin to around 64KB
>>
>>
>> I can not enforce a minimum in SO_RCVBUF (other than the small one added in
>> commit eea86af6b1e18d6fa8dc959e3ddc0100f27aff9f ("net: sock: adapt
>> SOCK_MIN_RCVBUF and SOCK_MIN_SNDBUF"))
>> otherwise many test programs will break, expecting to set a low value.
>
>
> This electronic communication and the information and any files transmitted with it, or attached to it, are confidential and are intended solely for the use of the individual or entity to whom it is addressed and may contain information that is confidential, legally privileged, protected by privacy laws, or otherwise restricted from disclosure to anyone else. If you are not the intended recipient or the person responsible for delivering the e-mail to the intended recipient, you are hereby notified that any use, copying, distributing, dissemination, forwarding, printing, or copying of this e-mail is strictly prohibited. If you received this e-mail in error, please return the e-mail to the sender, delete it from your computer, and destroy any printed copy of it.