2021-09-01 19:24:42

by Yunsheng Lin

[permalink] [raw]
Subject: [PATCH net-next] tcp: add tcp_tx_skb_cache_key checking in sk_stream_alloc_skb()

Since tcp_tx_skb_cache is disabled by default in:
commit 0b7d7f6b2208 ("tcp: add tcp_tx_skb_cache sysctl")

Add tcp_tx_skb_cache_key checking in sk_stream_alloc_skb() to
avoid possible branch-misses.

Signed-off-by: Yunsheng Lin <[email protected]>
---
Also, the sk->sk_tx_skb_cache may be both changed by allocation
and freeing side, I assume there may be some implicit protection
here too, such as the NAPI protection for rx?
---
net/ipv4/tcp.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index e8b48df..226ddef 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -866,7 +866,7 @@ struct sk_buff *sk_stream_alloc_skb(struct sock *sk, int size, gfp_t gfp,
{
struct sk_buff *skb;

- if (likely(!size)) {
+ if (static_branch_unlikely(&tcp_tx_skb_cache_key) && likely(!size)) {
skb = sk->sk_tx_skb_cache;
if (skb) {
skb->truesize = SKB_TRUESIZE(skb_end_offset(skb));
--
2.7.4


2021-09-01 20:05:17

by Paolo Abeni

[permalink] [raw]
Subject: Re: [PATCH net-next] tcp: add tcp_tx_skb_cache_key checking in sk_stream_alloc_skb()

On Wed, 2021-09-01 at 08:16 -0700, Eric Dumazet wrote:
> On Wed, Sep 1, 2021 at 8:06 AM Eric Dumazet <[email protected]> wrote:
> > On Wed, Sep 1, 2021 at 3:52 AM Paolo Abeni <[email protected]> wrote:
> > > On Wed, 2021-09-01 at 18:39 +0800, Yunsheng Lin wrote:
> > > > Since tcp_tx_skb_cache is disabled by default in:
> > > > commit 0b7d7f6b2208 ("tcp: add tcp_tx_skb_cache sysctl")
> > > >
> > > > Add tcp_tx_skb_cache_key checking in sk_stream_alloc_skb() to
> > > > avoid possible branch-misses.
> > > >
> > > > Signed-off-by: Yunsheng Lin <[email protected]>
> > >
> > > Note that MPTCP is currently exploiting sk->sk_tx_skb_cache. If we get
> > > this patch goes in as-is, it will break mptcp.
> > >
> > > One possible solution would be to let mptcp usage enable sk-
> > > > sk_tx_skb_cache, but that has relevant side effects on plain TCP.
> > >
> > > Another options would be re-work once again the mptcp xmit path to
> > > avoid using sk->sk_tx_skb_cache.
> > >
> >
> > Hmmm, I actually wrote a revert of this feature but forgot to submit
> > it last year.
> >
> > commit c36cfbd791f62c0f7c6b32132af59dfdbe6be21b (HEAD -> listener_scale4)
> > Author: Eric Dumazet <[email protected]>
> > Date: Wed May 20 06:38:38 2020 -0700
> >
> > tcp: remove sk_{tr}x_skb_cache
> >
> > This reverts the following patches :
> >
> > 2e05fcae83c41eb2df10558338dc600dc783af47 ("tcp: fix compile error
> > if !CONFIG_SYSCTL")
> > 4f661542a40217713f2cee0bb6678fbb30d9d367 ("tcp: fix zerocopy and
> > notsent_lowat issues")
> > 472c2e07eef045145bc1493cc94a01c87140780a ("tcp: add one skb cache for tx")
> > 8b27dae5a2e89a61c46c6dbc76c040c0e6d0ed4c ("tcp: add one skb cache for rx")
> >
> > Having a cache of one skb (in each direction) per TCP socket is fragile,
> > since it can cause a significant increase of memory needs,
> > and not good enough for high speed flows anyway where more than one skb
> > is needed.
> >
> > We want instead to add a generic infrastructure, with more flexible per-cpu
> > caches, for alien NUMA nodes.
> >
> > Signed-off-by: Eric Dumazet <[email protected]>
> >
> > I will update this commit to also remove the part in MPTCP.
> >
> > Let's remove this feature and replace it with something less costly.
>
> Paolo, can you work on MPTP side, so that my revert can be then applied ?

You are way too fast, I was still replying to your previous email,
asking if I could help :)

I'll a look ASAP. Please, allow for some latency: I'm way slower!

Cheers,

Paolo

2021-09-02 00:55:56

by Yunsheng Lin

[permalink] [raw]
Subject: Re: [PATCH net-next] tcp: add tcp_tx_skb_cache_key checking in sk_stream_alloc_skb()

On 2021/9/1 18:39, Yunsheng Lin wrote:
> Since tcp_tx_skb_cache is disabled by default in:
> commit 0b7d7f6b2208 ("tcp: add tcp_tx_skb_cache sysctl")
>
> Add tcp_tx_skb_cache_key checking in sk_stream_alloc_skb() to
> avoid possible branch-misses.
>
> Signed-off-by: Yunsheng Lin <[email protected]>
> ---
> Also, the sk->sk_tx_skb_cache may be both changed by allocation
> and freeing side, I assume there may be some implicit protection
> here too, such as the NAPI protection for rx?

Hi, Eric
Is there any implicit protection for sk->sk_tx_skb_cache?
As my understanding, sk_stream_alloc_skb() seems to be protected
by lock_sock(), and the sk_wmem_free_skb() seems to be mostly
happening in NAPI polling for TCP(when ack packet is received)
without lock_sock(), so it seems there is no protection here?

>

2021-09-02 01:53:36

by Eric Dumazet

[permalink] [raw]
Subject: Re: [PATCH net-next] tcp: add tcp_tx_skb_cache_key checking in sk_stream_alloc_skb()

On Wed, Sep 1, 2021 at 5:47 PM Yunsheng Lin <[email protected]> wrote:
>
> On 2021/9/1 18:39, Yunsheng Lin wrote:
> > Since tcp_tx_skb_cache is disabled by default in:
> > commit 0b7d7f6b2208 ("tcp: add tcp_tx_skb_cache sysctl")
> >
> > Add tcp_tx_skb_cache_key checking in sk_stream_alloc_skb() to
> > avoid possible branch-misses.
> >
> > Signed-off-by: Yunsheng Lin <[email protected]>
> > ---
> > Also, the sk->sk_tx_skb_cache may be both changed by allocation
> > and freeing side, I assume there may be some implicit protection
> > here too, such as the NAPI protection for rx?
>
> Hi, Eric
> Is there any implicit protection for sk->sk_tx_skb_cache?
> As my understanding, sk_stream_alloc_skb() seems to be protected
> by lock_sock(), and the sk_wmem_free_skb() seems to be mostly
> happening in NAPI polling for TCP(when ack packet is received)
> without lock_sock(), so it seems there is no protection here?
>

Please look again.
This is protected by socket lock of course.
Otherwise sk_mem_uncharge() would be very broken, sk->sk_forward_alloc
is not an atomic field.

TCP stack has no direct relation with NAPI.
It can run over loopback interface, no NAPI there.

2021-09-02 02:09:34

by Yunsheng Lin

[permalink] [raw]
Subject: Re: [PATCH net-next] tcp: add tcp_tx_skb_cache_key checking in sk_stream_alloc_skb()

On 2021/9/2 9:13, Eric Dumazet wrote:
> On Wed, Sep 1, 2021 at 5:47 PM Yunsheng Lin <[email protected]> wrote:
>>
>> On 2021/9/1 18:39, Yunsheng Lin wrote:
>>> Since tcp_tx_skb_cache is disabled by default in:
>>> commit 0b7d7f6b2208 ("tcp: add tcp_tx_skb_cache sysctl")
>>>
>>> Add tcp_tx_skb_cache_key checking in sk_stream_alloc_skb() to
>>> avoid possible branch-misses.
>>>
>>> Signed-off-by: Yunsheng Lin <[email protected]>
>>> ---
>>> Also, the sk->sk_tx_skb_cache may be both changed by allocation
>>> and freeing side, I assume there may be some implicit protection
>>> here too, such as the NAPI protection for rx?
>>
>> Hi, Eric
>> Is there any implicit protection for sk->sk_tx_skb_cache?
>> As my understanding, sk_stream_alloc_skb() seems to be protected
>> by lock_sock(), and the sk_wmem_free_skb() seems to be mostly
>> happening in NAPI polling for TCP(when ack packet is received)
>> without lock_sock(), so it seems there is no protection here?
>>
>
> Please look again.
> This is protected by socket lock of course.
> Otherwise sk_mem_uncharge() would be very broken, sk->sk_forward_alloc
> is not an atomic field.

Thanks for clarifying.
I have been looking for a point to implement the socket'pp_alloc_cache for
page pool, and sk_wmem_free_skb() seems like the place to avoid the
scalablity problem of ptr_ring in page pool.

The protection for sk_wmem_free_skb() is in tcp_v4_rcv(), right?
https://elixir.bootlin.com/linux/latest/source/net/ipv4/tcp_ipv4.c#L2081

>
> TCP stack has no direct relation with NAPI.
> It can run over loopback interface, no NAPI there.
> .
>