Received: by 2002:a05:6a10:1d13:0:0:0:0 with SMTP id pp19csp183295pxb; Thu, 2 Sep 2021 01:35:36 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyfrDtdnomV6W/LSddl0Kll4m8ewSy3WBYipNaCjfS1ikRxCIsxr1ge1MTFt3wChz5TtnzV X-Received: by 2002:a17:906:1615:: with SMTP id m21mr2557344ejd.279.1630571736250; Thu, 02 Sep 2021 01:35:36 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1630571736; cv=none; d=google.com; s=arc-20160816; b=owRC6HBcg9jM/lZ5m1fciHhkxxGEVmI9nUHRX/9FJcLKbBZvdCOYL0lCpQC/kaLFtG itdN4yEUs7Bt6Gpsx0JKcNAy1sTiSiLh9QV+soCJMxiOiWIdASVGLLg/wS/SuwLBwQGi KQXdWN95juq4EdWOXOUfvShW6fo2LhP3RKpemgBe6oFGlfRD9XgWrQN9IjkLSn4sIL4v Su6lHh06yS/dl48WegmVUH2jT9YSVjY17xrRm22zDOeCrvxsiq5VsGjqjt4OSvpXeI8H SKFSQJLfdX9nuWHDuSyaEHObQmZ1YSJTQ/+D+Pc/1MbBjhnxgXMr4DZDDel3DGr1ckaW 5jwg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:references:cc :to:from:subject:dkim-signature; bh=62yNtXAVsWniAwR6yGNwwYjxs676XySY+v+xUYXen5A=; b=XkJjLqv2IQYH+zgQrI9lsA0JD6FJWoKCru33LurIuq2vZA1seoCRE9kxwQQoIDbFcB GHs0I9k8QjHvjDHZQ/Q3rwXoVOHB5trd2MAu6/d3F4slsTuhuhayxRar5kKntokL8TBQ YJ2siMWxdgW/N2FsxJInC6Sa2+QauwZhYYHsOOK9qcV7vyi/0vF4mJG2kkrKhCStXFWq Df/0Xks2wiGcDR1xlD8tRWAPFS4NH90aA/dL14SW31ieY7BzDvxvzumGpLKlsB8K9Ayw Tuo0Ydznoi2/5W6fpJCpGXCd5OQRlRFYm9rN3riahw0TIjS0U5offsW0DK4T8iT1ZDDM oEAA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@virtuozzo.com header.s=relay header.b=EBbScHoX; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=virtuozzo.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id dp12si1424569ejc.300.2021.09.02.01.34.52; Thu, 02 Sep 2021 01:35:36 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@virtuozzo.com header.s=relay header.b=EBbScHoX; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=virtuozzo.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244392AbhIBIdK (ORCPT + 99 others); Thu, 2 Sep 2021 04:33:10 -0400 Received: from relay.sw.ru ([185.231.240.75]:49666 "EHLO relay.sw.ru" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S243772AbhIBIdJ (ORCPT ); Thu, 2 Sep 2021 04:33:09 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=virtuozzo.com; s=relay; h=Content-Type:MIME-Version:Date:Message-ID:From: Subject; bh=62yNtXAVsWniAwR6yGNwwYjxs676XySY+v+xUYXen5A=; b=EBbScHoXb2TSzfvaV 1pU3ncTE20uBKmts+wDKmKqfsoyjQ3NuRqHou4ZROvnVw2q8sNykJK+13p6st9GCSv8zr4GE8yWEO bejNqnb4uFsNk4ObHk+++NnJKB4HnlDedD2BZn9QPJFaTrFBqYJEdI8Q+7MPaCXJusCJ5pOzvKkvU =; Received: from [10.93.0.56] by relay.sw.ru with esmtp (Exim 4.94.2) (envelope-from ) id 1mLi8e-000YeG-G6; Thu, 02 Sep 2021 11:32:00 +0300 Subject: Re: [PATCH net-next v4] skb_expand_head() adjust skb->truesize incorrectly From: Vasily Averin To: Eric Dumazet , Christoph Paasch , "David S. Miller" Cc: Hideaki YOSHIFUJI , David Ahern , Jakub Kicinski , netdev , linux-kernel@vger.kernel.org, kernel@openvz.org, Alexey Kuznetsov , Julian Wiedmann References: <67740366-7f1b-c953-dfe1-d2085297bdf3@gmail.com> <8a183782-f4b9-e12a-55d1-c4a3c4078369@virtuozzo.com> <2984f16b-7f20-e72d-1661-b942fdc4ff9b@virtuozzo.com> Message-ID: <27f87dd8-f6e4-b2b0-2b3a-9378fddf147f@virtuozzo.com> Date: Thu, 2 Sep 2021 11:31:59 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 MIME-Version: 1.0 In-Reply-To: <2984f16b-7f20-e72d-1661-b942fdc4ff9b@virtuozzo.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 9/2/21 10:33 AM, Vasily Averin wrote: > On 9/2/21 10:13 AM, Vasily Averin wrote: >> On 9/2/21 7:48 AM, Eric Dumazet wrote: >>> On 9/1/21 9:32 PM, Eric Dumazet wrote: >>>> I think you missed netem case, in particular >>>> skb_orphan_partial() which I already pointed out. >>>> >>>> You can setup a stack of virtual devices (tunnels), >>>> with a qdisc on them, before ip6_xmit() is finally called... >>>> >>>> Socket might have been closed already. >>>> >>>> To test your patch, you could force a skb_orphan_partial() at the beginning >>>> of skb_expand_head() (extending code coverage) >>> >>> To clarify : >>> >>> It is ok to 'downgrade' an skb->destructor having a ref on sk->sk_wmem_alloc to >>> something owning a ref on sk->refcnt. >>> >>> But the opposite operation (ref on sk->sk_refcnt --> ref on sk->sk_wmem_alloc) is not safe. >> >> Could you please explain in more details, since I stil have a completely opposite point of view? >> >> Every sk referenced in skb have sk_wmem_alloc > 9 >> It is assigned to 1 in sk_alloc and decremented right before last __sk_free(), >> inside both sk_free() sock_wfree() and __sock_wfree() >> >> So it is safe to adjust skb->sk->sk_wmem_alloc, >> because alive skb keeps reference to alive sk and last one keeps sk_wmem_alloc > 0 >> >> So any destructor used sk->sk_refcnt will already have sk_wmem_alloc > 0, >> because last sock_put() calls sk_free(). >> >> However now I'm not sure in reversed direction. >> skb_set_owner_w() check !sk_fullsock(sk) and call sock_hold(sk); >> If sk->sk_refcnt can be 0 here (i.e. after execution of old destructor inside skb_orphan) >> -- it can be trigger pointed problem: >> "refcount_add() will trigger a warning (panic under KASAN)". >> >> Could you please explain where I'm wrong? > > To clarify: > I'm agree it is unsafe to call on alive skb: I badly explained the problem in previous letter, let me repeat once again: I'm told about this piece of code: + } else if (sk && skb->destructor != sock_edemux) { + delta = osize - skb_end_offset(skb); + if (!is_skb_wmem(skb)) + skb_set_owner_w(skb, sk); + skb->truesize += delta; + if (sk_fullsock(sk)) + refcount_add(delta, &sk->sk_wmem_alloc); } it is called on alive expanded skb and it is incorrect because 2 reasons: a) if old destructor use ref on sk->sk_wmem_alloc It can decrease to 0 and release sk. b) if old descriptor use ref on sk->refcnt and !sk_fullsock(sk) old decriptor can release last reference and release sk. We can workaround release of sk by move of refcount_add(delta, &sk->sk_wmem_alloc) before skb_set_owner_w() } else if (sk && skb->destructor != sock_edemux) { delta = osize - skb_end_offset(skb); refcount_add(delta, &sk->sk_wmem_alloc); if (!is_skb_wmem(skb)) skb_set_owner_w(skb, sk); skb->truesize += delta; #ifdef CONFIG_INET if (!sk_fullsock(sk)) refcount_dec(delta, &sk->sk_wmem_alloc); #endif } However it it does not resolve b) completely oid skb_set_owner_w(struct sk_buff *skb, struct sock *sk) { skb_orphan(skb); <<< old destructor releases last sk->refcnt ... skb->sk = sk; ... if (unlikely(!sk_fullsock(sk))) { skb->destructor = sock_edemux; sock_hold(sk); <<<< ...and it trigger wrining/panic return; } Thank you, Vasily Averin