Received: by 2002:a05:6a10:d5a5:0:0:0:0 with SMTP id gn37csp3923563pxb; Mon, 4 Oct 2021 12:45:24 -0700 (PDT) X-Google-Smtp-Source: ABdhPJy3VKj/LccGPErdk9AvE1Tk+uPmC8XYaDsYKxeVLfxNdZwobVgIv1ZX9Y5XjL6+GyMPnPyY X-Received: by 2002:a17:906:b05a:: with SMTP id bj26mr18936146ejb.226.1633376724197; Mon, 04 Oct 2021 12:45:24 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1633376724; cv=none; d=google.com; s=arc-20160816; b=ByW4UsqUCd547cxDqN1yC0E5XplmHw99lxjO60U3b7fJ0UDKmEk765Mwjf+6l3C1zN nIqyZ5gJcf9Ah6bGAxlTGfZIAIK/NAP4gLYM4j8oOnjg9m+53t4kFf5cLDN0SQ+3YWsa V9/vWcCihnqx8fvk5mONilb0jHVY2iPCzB86/dXlGIqm2yHyt0Zlib+GGKh7tg8Ihs4T aIMuCL+tryuW8V1v1k8MSrTQo8mlK00+Sno7RouQeVJ6ZiBy4eJbnMcnR0SKo2CnEqg6 748nKhmCSwtUMHxpbe3uWYnzArUJA8AcpLv89svAIEO3N9mcjX+DNtjgAerIk7OzjaOa cnIQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:references:cc :to:from:subject:dkim-signature; bh=WQ33uEg7GhDWMq/xO0qWy61MKydhEka6Yf8IGioQDH8=; b=a4RZsKjTZi3TK7qVIH8pZuuKSdea7OhQ0i9ov9ztPk8YukEQYzWTGwXiY74Jmaa7Mk UTDEK32+Wt0yBEVfgWPJ4Ku2jai+g8x33Aztq/CWdmBBCfkq11qXlfgwZdCYREaI0s5r aRduTiC4LEhkRPdHwJFhOTvvSbDYvBSIX8bFJOksBECodavS2JDEqPka2uMfChK4YToq claBJSwiDVpMg4QdtRe4nP4S1NfDu20nViM5D9HAi85ALO7H80vFTq5bXHQ+4NqJ8kO5 Oe+10y1IBPl7WnXa9X5oUoLY5b9eTehYwsAkuV0GnwoHo1KbUuhvHIRAlmUm1W3mWOpY 5BkA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@virtuozzo.com header.s=relay header.b=FDkS6AqW; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=virtuozzo.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id dq22si22655258ejc.83.2021.10.04.12.44.59; Mon, 04 Oct 2021 12:45:24 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@virtuozzo.com header.s=relay header.b=FDkS6AqW; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=virtuozzo.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236068AbhJDNTO (ORCPT + 99 others); Mon, 4 Oct 2021 09:19:14 -0400 Received: from relay.sw.ru ([185.231.240.75]:37906 "EHLO relay.sw.ru" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236256AbhJDNQe (ORCPT ); Mon, 4 Oct 2021 09:16:34 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=virtuozzo.com; s=relay; h=Content-Type:MIME-Version:Date:Message-ID:From: Subject; bh=WQ33uEg7GhDWMq/xO0qWy61MKydhEka6Yf8IGioQDH8=; b=FDkS6AqWQ3Ld/xios dfXeQ8FJPTjLmkjavhRV2s0PKFZHhDigwYEYUTDS/q61oZjWMMZRstKMO1dIrLaourmMJITsI65za 9QZoABUDhD7xU3oKvEW/i/mx2ojR607UiI1pmZ4KGJgoRyMBbfEjeT997ZdrpAmW/jijdrEa2Q3nM =; Received: from [10.93.0.56] by relay.sw.ru with esmtp (Exim 4.94.2) (envelope-from ) id 1mXNnn-004uNU-UB; Mon, 04 Oct 2021 16:14:43 +0300 Subject: Re: [PATCH net v9] skb_expand_head() adjust skb->truesize incorrectly From: Vasily Averin To: Eric Dumazet , Jakub Kicinski Cc: netdev , "David S. Miller" , Hideaki YOSHIFUJI , David Ahern , Julian Wiedmann , Christoph Paasch , linux-kernel@vger.kernel.org, kernel@openvz.org References: <45b3cb13-8c6e-25a3-f568-921ab6f1ca8f@virtuozzo.com> <2bd9c638-3038-5aba-1dae-ad939e13c0c4@virtuozzo.com> Message-ID: <7d5f8955-56d4-d5a2-c74e-ed6d19649abb@virtuozzo.com> Date: Mon, 4 Oct 2021 16:14:43 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.13.0 MIME-Version: 1.0 In-Reply-To: <2bd9c638-3038-5aba-1dae-ad939e13c0c4@virtuozzo.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Dear Eric, could you please take look at this patch? Original issue reported by Christoph Paasch is still not fixed, however now buggy paches was merged into upstream. v6 patch version [1] fixes the problem by careful change of destructor on existing skb. I still think it is correct however I'm agree it requires careful review. [1] https://lkml.org/lkml/2021/9/6/584 This patch version is more simple and returns to cloning of non-wmem skb. Both variants (i.e. this one and v6) resolves the problem. Could you please review the patches, select one of them or propose some better solution? Thank you, Vasily Averin On 10/4/21 4:00 PM, Vasily Averin wrote: > Christoph Paasch reports [1] about incorrect skb->truesize > after skb_expand_head() call in ip6_xmit. > This may happen because of two reasons: > - skb_set_owner_w() for newly cloned skb is called too early, > before pskb_expand_head() where truesize is adjusted for (!skb-sk) case. > - pskb_expand_head() does not adjust truesize in (skb->sk) case. > In this case sk->sk_wmem_alloc should be adjusted too. > > [1] https://lkml.org/lkml/2021/8/20/1082 > > Fixes: f1260ff15a71 ("skbuff: introduce skb_expand_head()") > Fixes: 2d85a1b31dde ("ipv6: ip6_finish_output2: set sk into newly allocated nskb") > Reported-by: Christoph Paasch > Signed-off-by: Vasily Averin > --- > v9: restored sock_edemux check > v8: clone non-wmem skb > v7 (from kuba@): > shift more magic into helpers, > follow Eric's advice and don't inherit non-wmem skbs for now > v6: fixed delta, > improved comments > v5: fixed else condition, thanks to Eric > reworked update of expanded skb, > added corresponding comments > v4: decided to use is_skb_wmem() after pskb_expand_head() call > fixed 'return (EXPRESSION);' in os_skb_wmem according to Eric Dumazet > v3: removed __pskb_expand_head(), > added is_skb_wmem() helper for skb with wmem-compatible destructors > there are 2 ways to use it: > - before pskb_expand_head(), to create skb clones > - after successfull pskb_expand_head() to change owner on extended > skb. > v2: based on patch version from Eric Dumazet, > added __pskb_expand_head() function, which can be forced > to adjust skb->truesize and sk->sk_wmem_alloc. > --- > include/net/sock.h | 1 + > net/core/skbuff.c | 35 ++++++++++++++++++++++------------- > net/core/sock.c | 8 ++++++++ > 3 files changed, 31 insertions(+), 13 deletions(-) > > diff --git a/include/net/sock.h b/include/net/sock.h > index 66a9a90f9558..a547651d46a7 100644 > --- a/include/net/sock.h > +++ b/include/net/sock.h > @@ -1695,6 +1695,7 @@ struct sk_buff *sock_wmalloc(struct sock *sk, unsigned long size, int force, > gfp_t priority); > void __sock_wfree(struct sk_buff *skb); > void sock_wfree(struct sk_buff *skb); > +bool is_skb_wmem(const struct sk_buff *skb); > struct sk_buff *sock_omalloc(struct sock *sk, unsigned long size, > gfp_t priority); > void skb_orphan_partial(struct sk_buff *skb); > diff --git a/net/core/skbuff.c b/net/core/skbuff.c > index 2170bea2c7de..8cb6d360cda5 100644 > --- a/net/core/skbuff.c > +++ b/net/core/skbuff.c > @@ -1804,30 +1804,39 @@ EXPORT_SYMBOL(skb_realloc_headroom); > struct sk_buff *skb_expand_head(struct sk_buff *skb, unsigned int headroom) > { > int delta = headroom - skb_headroom(skb); > + int osize = skb_end_offset(skb); > + struct sock *sk = skb->sk; > > if (WARN_ONCE(delta <= 0, > "%s is expecting an increase in the headroom", __func__)) > return skb; > > - /* pskb_expand_head() might crash, if skb is shared */ > - if (skb_shared(skb)) { > + delta = SKB_DATA_ALIGN(delta); > + /* pskb_expand_head() might crash, if skb is shared. */ > + if (skb_shared(skb) || !is_skb_wmem(skb)) { > struct sk_buff *nskb = skb_clone(skb, GFP_ATOMIC); > > - if (likely(nskb)) { > - if (skb->sk) > - skb_set_owner_w(nskb, skb->sk); > - consume_skb(skb); > - } else { > - kfree_skb(skb); > - } > + if (unlikely(!nskb)) > + goto fail; > + > + if (sk) > + skb_set_owner_w(nskb, sk); > + consume_skb(skb); > skb = nskb; > } > - if (skb && > - pskb_expand_head(skb, SKB_DATA_ALIGN(delta), 0, GFP_ATOMIC)) { > - kfree_skb(skb); > - skb = NULL; > + if (pskb_expand_head(skb, delta, 0, GFP_ATOMIC)) > + goto fail; > + > + if (sk && skb->destructor != sock_edemux) { > + delta = skb_end_offset(skb) - osize; > + refcount_add(delta, &sk->sk_wmem_alloc); > + skb->truesize += delta; > } > return skb; > + > +fail: > + kfree_skb(skb); > + return NULL; > } > EXPORT_SYMBOL(skb_expand_head); > > diff --git a/net/core/sock.c b/net/core/sock.c > index 62627e868e03..1932755ae9ba 100644 > --- a/net/core/sock.c > +++ b/net/core/sock.c > @@ -2227,6 +2227,14 @@ void skb_set_owner_w(struct sk_buff *skb, struct sock *sk) > } > EXPORT_SYMBOL(skb_set_owner_w); > > +bool is_skb_wmem(const struct sk_buff *skb) > +{ > + return skb->destructor == sock_wfree || > + skb->destructor == __sock_wfree || > + (IS_ENABLED(CONFIG_INET) && skb->destructor == tcp_wfree); > +} > +EXPORT_SYMBOL(is_skb_wmem); > + > static bool can_skb_orphan_partial(const struct sk_buff *skb) > { > #ifdef CONFIG_TLS_DEVICE >