Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp167573pxb; Wed, 3 Feb 2021 02:28:38 -0800 (PST) X-Google-Smtp-Source: ABdhPJyR1hUNhc/hBot/TqYAWEPftX37K4QGVqlLUOJ0VIa1y2gGUM0RDFnc4jGReaqhYmWPX8Ud X-Received: by 2002:a05:6402:270d:: with SMTP id y13mr2194033edd.149.1612348118337; Wed, 03 Feb 2021 02:28:38 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1612348118; cv=none; d=google.com; s=arc-20160816; b=dQQ4J9Z3Y8Q3261ysmHS60vmlBuQn7pQNdmHr1uruBxmr72tiKVi8Zqg0G/EM0om6M PIVcMKzS2M5SWLmTGrDjSmK6vw2YJrE1Pws12SdTPrrIHz4dLQ+8cr9ctLQBm2ovEHvb GRj8Y97WvXp/VbgtfoZi9tWm8gwqKzG32G9jAXcubdbnrpx5nfG5jaB64GN7qpSCrLTi xgRxcJvQlR9cZ1GBxFqVIIbJMkRdZNsG9vHg0JZwM735SPlbWZDOfEkcFEdt2BSrC1Ye jv/L8i0YbC9rVvK/U7PjXBhz4FfqOBoxTI6Py9Z6e67OEW2F71ouzpp1g07wMYOFoBHN emmA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=EtTD4b2iNg/VqSN2G6Jpuwam+M/5nbFDPRiVPwvCE/4=; b=T+4QNAilc+b/rd12WU2jQT6iyMtoRGlF9IJuXBQENLVYosviReKviqKSjTu+6D4wn1 xTD4AIK94oxmQgXQiONZFsto6+kJ1z2L0MoBO6aQwU83ZGkvQAYe4sWZLYv0a+hA55h0 Fi9eVOHIR6sWiFLWGmKK0DbUyTyd5StQ5aCQNi9hj1cFxDsRoiui30gYSsvnvKGAywiK ATZRuqS+sgLLGguwoQQ6F64SvsHksaFp+gVHQ73bYbRh6IZdm0yF5YkBCx3wK9Nk56bN 3E30n/Eu4FOH/jLsORIPWuJ36e1V0p+RT2ZqpT7jm81V1ix0TG7OsC59CExcU7Taaz2s VEoA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=doDAlVtY; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id q16si914037edc.229.2021.02.03.02.28.13; Wed, 03 Feb 2021 02:28:38 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=doDAlVtY; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233857AbhBCK0y (ORCPT + 99 others); Wed, 3 Feb 2021 05:26:54 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59074 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233839AbhBCK0a (ORCPT ); Wed, 3 Feb 2021 05:26:30 -0500 Received: from mail-oo1-xc2e.google.com (mail-oo1-xc2e.google.com [IPv6:2607:f8b0:4864:20::c2e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7A1A9C0613D6 for ; Wed, 3 Feb 2021 02:25:50 -0800 (PST) Received: by mail-oo1-xc2e.google.com with SMTP id r199so5902512oor.2 for ; Wed, 03 Feb 2021 02:25:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=EtTD4b2iNg/VqSN2G6Jpuwam+M/5nbFDPRiVPwvCE/4=; b=doDAlVtYqNalDhx3M1Z3Zl4NU0+FkBEjVi9RO1YAo+bS1qx0SJRigN75Ir4J5DvqEQ t7QV9Jv56Llcmt7VDTd0tZoFHnNoGziGSc+O9IBHZzGUDkwGNUCGSyZ+hveA1CYu+pa3 O8r7Oxoca8JFvA/BceDKyZi+i/YViiZCgOKSOXr3f/lgZR8YYC97+0oIuvS07FkR+CVX 1l651DBSdS9Qtxfzn0+KI15so99yUcmpKW2dNNg18jDuGviJbP56Y0Dqiw93Ha2Rccwu uCJKj6KOHuBAygQwK/uO2eWWNM7r4lXkV5ygQCmJdKDV/6gzJAKRvGPvaI6ZNk4CuMig f8YA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=EtTD4b2iNg/VqSN2G6Jpuwam+M/5nbFDPRiVPwvCE/4=; b=mH5i0rS5AaVUURtJHgn4b9ELk7jMaYVzTsE5dWudMNDcfFhxowD6qH5sNUns9qp0RC 2MHvynKkfjmRp+lzg4yLewsJ4Onr0mV11opqkoZ7MfO0CYqR9eYXYNvOKCpK8t/o89Oz M0uvtjMMKjpZyzGfyRYNq9xXj3Q1CfIjs54T8WKLQewT683Tgm/JGT3qtNY6r9ZTLXe7 7lIOLb7GBnuxWQcKx3lA2bJlQNdTqgIzGPUcXAvzfjVb63pEfUwjX9wJeqJKGLUQBKPf M8cta7w5I1WnwueH+Drl+f5nrVykJ8l+23AZ57eiVhlWyo95eQsYCEqQifYQOjryUu56 23Zw== X-Gm-Message-State: AOAM533LG8jglMh6x310cYM1F927qW9LRc4UMJTldeFuGDWTHZZl0Geo euNGwVmuuaQNhDxwikkRe2uHxhby0/i92ngddMGtKA== X-Received: by 2002:a4a:d384:: with SMTP id i4mr1608153oos.14.1612347949546; Wed, 03 Feb 2021 02:25:49 -0800 (PST) MIME-Version: 1.0 References: <000000000000b4862805b54ef573@google.com> In-Reply-To: From: Marco Elver Date: Wed, 3 Feb 2021 11:25:37 +0100 Message-ID: Subject: Re: WARNING in sk_stream_kill_queues (5) To: Eric Dumazet Cc: Eric Dumazet , netdev , Andrew Morton , David Miller , Dmitry Vyukov , Alexander Potapenko , Jann Horn , Jakub Kicinski , LKML , Stephen Rothwell , syzkaller-bugs , Willem de Bruijn , syzbot Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, 14 Dec 2020 at 11:09, Marco Elver wrote: > On Thu, 10 Dec 2020 at 20:01, Marco Elver wrote: > > On Thu, 10 Dec 2020 at 18:14, Eric Dumazet wrote: > > > On Thu, Dec 10, 2020 at 5:51 PM Marco Elver wrote: > > [...] > > > > So I started putting gdb to work, and whenever I see an allocation > > > > exactly like the above that goes through tso_fragment() a warning > > > > immediately follows. > > > > > > > > Long story short, I somehow synthesized this patch that appears to fix > > > > things, but I can't explain why exactly: > > > > > > > > | --- a/net/core/skbuff.c > > > > | +++ b/net/core/skbuff.c > > > > | @@ -1679,13 +1679,6 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail, > > > > | > > > > | skb_metadata_clear(skb); > > > > | > > > > | - /* It is not generally safe to change skb->truesize. > > > > | - * For the moment, we really care of rx path, or > > > > | - * when skb is orphaned (not attached to a socket). > > > > | - */ > > > > | - if (!skb->sk || skb->destructor == sock_edemux) > > > > | - skb->truesize += size - osize; > > > > | - > > > > | return 0; > > > > | > > > > | nofrags: > > > > > > > > Now, here are the breadcrumbs I followed: > > > > > > > > > > > > 1. Breakpoint on kfence_ksize() -- first allocation that matches the above: > > > > > > > > | #0 __kfence_ksize (s=18446612700164612096) at mm/kfence/core.c:726 > > > > | #1 0xffffffff816fbf30 in kfence_ksize (addr=0xffff888436856000) at mm/kfence/core.c:737 > > > > | #2 0xffffffff816217cf in ksize (objp=0xffff888436856000) at mm/slab_common.c:1178 > > > > | #3 0xffffffff84896911 in __alloc_skb (size=914710528, gfp_mask=2592, flags=0, node=-1) at net/core/skbuff.c:217 > > > > | #4 0xffffffff84d0ba73 in alloc_skb_fclone (priority=, size=) at ./include/linux/skbuff.h:1144 > > > > | #5 sk_stream_alloc_skb (sk=0xffff8881176cc000, size=0, gfp=2592, force_schedule=232) at net/ipv4/tcp.c:888 > > > > | #6 0xffffffff84d41c36 in tso_fragment (gfp=, mss_now=, len=, > > > > | skb=, sk=) at net/ipv4/tcp_output.c:2124 > > > > | #7 tcp_write_xmit (sk=0xffff8881176cc000, mss_now=21950, nonagle=3096, push_one=-1996874776, gfp=0) > > > > | at net/ipv4/tcp_output.c:2674 > > > > | #8 0xffffffff84d43e48 in __tcp_push_pending_frames (sk=0xffff8881176cc000, cur_mss=337, nonagle=0) > > > > | at ./include/net/sock.h:918 > > > > | #9 0xffffffff84d3259c in tcp_push_pending_frames (sk=) at ./include/net/tcp.h:1864 > > > > | #10 tcp_data_snd_check (sk=) at net/ipv4/tcp_input.c:5374 > > > > | #11 tcp_rcv_established (sk=0xffff8881176cc000, skb=0x0 ) at net/ipv4/tcp_input.c:5869 > > > > | #12 0xffffffff84d56731 in tcp_v4_do_rcv (sk=0xffff8881176cc000, skb=0xffff888117f52ea0) at net/ipv4/tcp_ipv4.c:1668 > > > > | [...] > > > > > > > > Set watchpoint on skb->truesize: > > > > > > > > | (gdb) frame 3 > > > > | #3 0xffffffff84896911 in __alloc_skb (size=914710528, gfp_mask=2592, flags=0, node=-1) at net/core/skbuff.c:217 > > > > | 217 size = SKB_WITH_OVERHEAD(ksize(data)); > > > > | (gdb) p &skb->truesize > > > > | $5 = (unsigned int *) 0xffff888117f55f90 > > > > | (gdb) awatch *0xffff888117f55f90 > > > > | Hardware access (read/write) watchpoint 6: *0xffff888117f55f90 > > > > > > > > 2. Some time later, we see that the skb with kfence-allocated data > > > > is cloned: > > > > > > > > | Thread 7 hit Hardware access (read/write) watchpoint 6: *0xffff888117f55f90 > > > > | > > > > | Value = 1570 > > > > | 0xffffffff84886947 in __skb_clone (n=0xffff888117f55fa0, skb=0xffff888117f55ec0) at net/core/skbuff.c:1002 > > > > | 1002 C(truesize); > > > > | (gdb) bt > > > > | #0 0xffffffff84886947 in __skb_clone (n=0xffff888117f55fa0, skb=0xffff888117f55ec0) at net/core/skbuff.c:1002 > > > > | #1 0xffffffff8488bfb9 in skb_clone (skb=0xffff888117f55ec0, gfp_mask=2592) at net/core/skbuff.c:1454 > > > > | #2 0xffffffff84d3cd1c in __tcp_transmit_skb (sk=0xffff8881176cc000, skb=0xffff888117f55ec0, clone_it=0, gfp_mask=2592, > > > > | rcv_nxt=0) at net/ipv4/tcp_output.c:1267 > > > > | #3 0xffffffff84d4125b in tcp_transmit_skb (gfp_mask=, clone_it=, skb=, > > > > | sk=) at ./include/linux/tcp.h:439 > > > > | #4 tcp_write_xmit (sk=0xffff8881176cc000, mss_now=392485600, nonagle=1326, push_one=-1996875104, gfp=0) > > > > | at net/ipv4/tcp_output.c:2688 > > > > | #5 0xffffffff84d43e48 in __tcp_push_pending_frames (sk=0xffff8881176cc000, cur_mss=337, nonagle=0) > > > > | at ./include/net/sock.h:918 > > > > | #6 0xffffffff84d3259c in tcp_push_pending_frames (sk=) at ./include/net/tcp.h:1864 > > > > | #7 tcp_data_snd_check (sk=) at net/ipv4/tcp_input.c:5374 > > > > | #8 tcp_rcv_established (sk=0xffff8881176cc000, skb=0x0 ) at net/ipv4/tcp_input.c:5869 > > > > | #9 0xffffffff84d56731 in tcp_v4_do_rcv (sk=0xffff8881176cc000, skb=0xffff888117f57820) at net/ipv4/tcp_ipv4.c:1668 > > > > | #10 0xffffffff8487bf67 in sk_backlog_rcv (skb=, sk=) at ./include/net/sock.h:1010 > > > > [...] > > > > > > > > > > > > 3. The original skb (that was cloned) has its truesize adjusted > > > > after a pskb_expand_head(): > > > > > > > > | Thread 2 hit Hardware access (read/write) watchpoint 6: *0xffff888117f55f90 > > > > | > > > > | Old value = 1570 > > > > | New value = 1954 > > > > > > > > ^^ the difference between the old and the new value is exactly > > > > 384, which is also the final underflow of the sk_wmem_queued > > > > that triggers the warning. Presumably if the original allocation > > > > had been through kmalloc-1k and not KFENCE, the difference here > > > > would have been 0, since ksize() of the original allocation in > > > > step (1) would have been 1024, and not 640 (difference of 384). > > > > > > > > | 0xffffffff8488d84b in pskb_expand_head (skb=0xffff888117f55ec0, nhead=401956752, ntail=1954, gfp_mask=2298092192) > > > > | at net/core/skbuff.c:1687 > > > > | 1687 skb->truesize += size - osize; > > > > | (gdb) bt > > > > | #0 0xffffffff8488d84b in pskb_expand_head (skb=0xffff888117f55ec0, nhead=401956752, ntail=1954, gfp_mask=2298092192) > > > > | at net/core/skbuff.c:1687 > > > > | #1 0xffffffff8488de01 in skb_prepare_for_shift (skb=) at ./arch/x86/include/asm/atomic.h:29 > > > > | #2 skb_prepare_for_shift (skb=0xffff888117f55ec0) at net/core/skbuff.c:3276 > > > > | #3 0xffffffff848936b1 in skb_shift (tgt=0xffff888117f549c0, skb=0xffff888117f55ec0, shiftlen=674) at net/core/skbuff.c:3351 > > > > | #4 0xffffffff84d264de in tcp_skb_shift (shiftlen=, pcount=, from=, > > > > | to=) at net/ipv4/tcp_input.c:1497 > > > > | #5 tcp_shift_skb_data (dup_sack=, end_seq=, start_seq=, state=, > > > > | skb=, sk=) at net/ipv4/tcp_input.c:1605 > > > > | #6 tcp_sacktag_walk (skb=0xffff888117f55ec0, sk=0xffff8881176cc000, next_dup=0x894, > > > > | state=0xffffffff88fa1aa0 , start_seq=0, end_seq=401956752, dup_sack_in=false) > > > > | at net/ipv4/tcp_input.c:1670 > > > > | #7 0xffffffff84d276de in tcp_sacktag_write_queue (sk=0xffff888117f55f90, ack_skb=0x1888117f55f90, prior_snd_una=2196, > > > > | state=0xffffffff88fa1aa0 ) at net/ipv4/tcp_input.c:1931 > > > > | #8 0xffffffff84d2ca1d in tcp_ack (sk=0xffff8881176cc000, skb=0x1888117f55f90, flag=16643) at net/ipv4/tcp_input.c:3758 > > > > | #9 0xffffffff84d32387 in tcp_rcv_established (sk=0xffff8881176cc000, skb=0xffff888117f54020) at net/ipv4/tcp_input.c:5858 > > > > | #10 0xffffffff84d56731 in tcp_v4_do_rcv (sk=0xffff8881176cc000, skb=0xffff888117f54020) at net/ipv4/tcp_ipv4.c:1668 > > > > [...] > > > > > > > > > > > > Any of this make sense? > > > > > > Very nice debugging ! > > > > > > I guess we could fix this in skb_prepare_for_shift(), eventually > > > caring for the truesize manipulation > > > (or reverting the change done in pskb_expand_head(), since only kfence > > > is having this issue. > > > > Phew, good to hear I finally got lucky. :-) > > > > Either option is fine, as long as it avoids this problem in future. > > Hopefully it can be fixed for 5.11. > > > > > (All TCP skbs in output path have the same allocation size for skb->head) > > > > > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c > > > index e578544b2cc7110ec2f6bcf4c29d93e4b4b1ad14..798b51eeeaa4fbed65d41d9eab207dbbf438dab3 > > > 100644 > > > --- a/net/core/skbuff.c > > > +++ b/net/core/skbuff.c > > > @@ -3270,7 +3270,14 @@ EXPORT_SYMBOL(skb_split); > > > */ > > > static int skb_prepare_for_shift(struct sk_buff *skb) > > > { > > > - return skb_cloned(skb) && pskb_expand_head(skb, 0, 0, GFP_ATOMIC); > > > + unsigned int ret = 0, save; > > > + > > > + if (skb_cloned(skb)) { > > > + save = skb->truesize; > > > + ret = pskb_expand_head(skb, 0, 0, GFP_ATOMIC); > > > + skb->truesize = save; > > > + } > > > + return ret; > > > } > > > > FWIW, > > > > Tested-by: Marco Elver > > Has this patch, or similar, already been sent? This is now in net-next: https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=097b9146c0e2 Thanks, -- Marco