Received: by 2002:a05:6358:16cc:b0:ea:6187:17c9 with SMTP id r12csp5268285rwl; Sun, 8 Jan 2023 11:43:08 -0800 (PST) X-Google-Smtp-Source: AMrXdXtm4Rc4v9qUn6BpTT91HF+TxUcT3DUNGkQ5Jj0XLrcnQX+VZnU+G8DsvDAbnBUsrTBixUAf X-Received: by 2002:a17:902:7895:b0:192:d625:ecc4 with SMTP id q21-20020a170902789500b00192d625ecc4mr19461042pll.18.1673206988217; Sun, 08 Jan 2023 11:43:08 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1673206988; cv=none; d=google.com; s=arc-20160816; b=y2HmH5nOgdi2pq4icw7zIoY/vvXSg3pdo1RTx/alGZsRdR/LPdyISLVChFd+nFUwdd IR1YgtsAe8BmMtNLugaj9z2kKNO6/eDWgfKxepPyTb+/qrOlk+cXuuWEUl0rIlM9iwpS U8bgysUbB3R4Znf3ln3646hvRXXlKFR/gVw/z6t2JIYpaRPufJ4ChcD8HuQjqsseiijD ktyVRmRoOR0VVIUJLQHoA9aBgS72/CHVH2tyX3U42EIFeC6Vea5bpyjBTqzkIaAfO0bK TSTDZYEp6CSNDVhtQs764LSsm0XAUys/6iugA46EwSHTCJYFgHbFlWIFJjEThuBxHkDp wkEg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=herWIHVnzt8zc2O4JnxfaGQj/0R0lZTjyeSXU86xHEE=; b=TPnNsg4a79Of+Z12OZkLYA81+7iKeaYlt//pr/CPCwmcD+IVn57tNvZVeQsOZrb20k Gfm2yYNyd7dPNVDigx22aLZIspYyFb/QQdEtY8vL4GdWOdZJWBZqd64szGHLdSWwrQ4n /4Q63M80h3CLjsSLxWGfNwRcRIgCG8u9O3ri1EpVZyvFPGXMl+Dz/YuVNKxxZUM9zWt/ l3lD7O8cg9+OGRA3uXdEiymlI8jC+2cYNVLsvsuh4BtJmYfXA09VS9uo5IxU5skUR0Xw AYzxUUJf7D36uov+yzyp4reZzowflhu4yuSvFNEb9EWcgEORKU/bQP+wvJbpWxZAAzyz MXeQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=YI+NYDp3; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id g5-20020a170902e38500b00186f81a0765si6610265ple.540.2023.01.08.11.43.02; Sun, 08 Jan 2023 11:43:08 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=YI+NYDp3; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231272AbjAHTid (ORCPT + 51 others); Sun, 8 Jan 2023 14:38:33 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48180 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230193AbjAHTib (ORCPT ); Sun, 8 Jan 2023 14:38:31 -0500 Received: from mail-yw1-x1133.google.com (mail-yw1-x1133.google.com [IPv6:2607:f8b0:4864:20::1133]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 897372BCB for ; Sun, 8 Jan 2023 11:38:29 -0800 (PST) Received: by mail-yw1-x1133.google.com with SMTP id 00721157ae682-4c9b9185d18so17314567b3.10 for ; Sun, 08 Jan 2023 11:38:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=herWIHVnzt8zc2O4JnxfaGQj/0R0lZTjyeSXU86xHEE=; b=YI+NYDp3tgE7gP7aQeNzEYMBd2Aan0VckzYG8DmoApQZtwMVW13CRxUH7Txv/yGpQy 1CEua/nPVkITsZYmx9HNs8bPenFHPZry7vt+wQsX0i7p1+FuhT9NxmXgqCILg+6drsFn COeGYEgxSDH+avLFUYOmyMtM0W8xCG/SY17tc1dft51XKc1J8mzW97MAu8Pwis7ozwJP CSNXkRkyby8SNJSW5NTPk+vaTzGzIQeDEOTp5Wt2Fn+4lmPlJqsE8tLeWYlGS79tVdxx XzL1ObIIVbmF8ekqwX/FzWiacTUb0MW0IcurWDKvkjFUr7wIBvDQqL+Cc/Y8+opeMtlX rv+g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=herWIHVnzt8zc2O4JnxfaGQj/0R0lZTjyeSXU86xHEE=; b=n+RkPllNtqvfjlpaqSVMLlcwwHofmeE6u+hmEIY7HygyPxekaxNcrQrqSVAPUV5bUQ Z/pg5mBK7yJGBWr2BauN5ytzl2xOQJH99Jd8S1wXa2gvL8/VGCG4RMHTPJZ1PoNCtHUx qRuC+lB2F9irl4zWmaU4AJF4FyGCkIjKhegy3DCpMAcvLJ2gerEF74gDyvzpauzDOv+k c/q7QT7k2ucnJoEOx/wWdHf0lMcFDcKVhT97z/ic2YyjLjixMddH86cd7IxdwpFGXDeo WLZFIRCxt3h7ClNbDWcyBsJddnJS8oQ3uEATCAgGPzlDYcDwj8dAZeJ/XsPMqj7CYw14 R2iA== X-Gm-Message-State: AFqh2kqPxdeaElLPUPXqqv3544PEe24GuU/HxzcR39/rq6xglznLtEs/ JS1xyeQEEcjIwQN7WuXSzxE+vDjKQ+5po0qaqAj9dA== X-Received: by 2002:a81:72c6:0:b0:4bb:c96d:f685 with SMTP id n189-20020a8172c6000000b004bbc96df685mr2088314ywc.208.1673206708518; Sun, 08 Jan 2023 11:38:28 -0800 (PST) MIME-Version: 1.0 References: <20230105211636.40616-1-pchelkin@ispras.ru> In-Reply-To: <20230105211636.40616-1-pchelkin@ispras.ru> From: Willem de Bruijn Date: Sun, 8 Jan 2023 14:37:52 -0500 Message-ID: Subject: Re: kernel BUG in __ip_make_skb() To: Fedor Pchelkin Cc: Eric Dumazet , Jakub Kicinski , Paolo Abeni , "David S. Miller" , netdev@vger.kernel.org, linux-kernel@vger.kernel.org, Alexey Khoroshilov , lvc-project@linuxtesting.org Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jan 5, 2023 at 4:17 PM Fedor Pchelkin wrote: > > Syzkaller reports the following crash: > > kernel BUG at include/linux/skbuff.h:2311! > invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI > CPU: 2 PID: 4615 Comm: syz-executor260 Not tainted 5.10.152-syzkaller #0 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014 > RIP: 0010:__skb_pull include/linux/skbuff.h:2311 [inline] > RIP: 0010:__ip_make_skb.cold+0x5b/0x5d net/ipv4/ip_output.c:1507 > Code: 79 a2 f9 8b 44 24 3c 89 ee 48 c7 c7 00 2e de 88 41 89 45 70 e8 d1 84 de ff 31 d2 4c 89 ee 48 c7 c7 40 2e de 88 e8 a1 26 ff ff <0f> 0b e8 63 79 a2 f9 e8 4e f7 e2 f9 48 c7 c7 40 39 de 88 e8 5e c1 > RSP: 0018:ffff88801e0af698 EFLAGS: 00010293 > RAX: 0000000000000000 RBX: ffff88814c408288 RCX: ffffffff87ce00ec > RDX: ffff888024a11ac0 RSI: ffffffff87ceccc6 RDI: 0000000000000001 > RBP: 0000000000000028 R08: 000000000000003b R09: ffff8880b8438ba7 > R10: 0000000000000000 R11: 0000000000000001 R12: ffff8881436a9cc0 > R13: ffff8881436a9cc0 R14: ffff8881436a9e00 R15: dffffc0000000000 > FS: 00007f2750eb7700(0000) GS:ffff8880b8500000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 00007f4438a29010 CR3: 0000000021f58000 CR4: 0000000000350ee0 > Call Trace: > ip_finish_skb include/net/ip.h:241 [inline] > udp_push_pending_frames net/ipv4/udp.c:979 [inline] > udp_sendpage+0x36e/0x570 net/ipv4/udp.c:1354 > inet_sendpage+0xd3/0x140 net/ipv4/af_inet.c:831 > kernel_sendpage.part.0+0x13c/0x280 net/socket.c:3514 > kernel_sendpage net/socket.c:3511 [inline] > sock_sendpage+0xe5/0x140 net/socket.c:944 > pipe_to_sendpage+0x2af/0x380 fs/splice.c:364 > splice_from_pipe_feed fs/splice.c:418 [inline] > __splice_from_pipe+0x3e5/0x840 fs/splice.c:562 > splice_from_pipe fs/splice.c:597 [inline] > generic_splice_sendpage+0xd4/0x140 fs/splice.c:743 > do_splice_from fs/splice.c:764 [inline] > direct_splice_actor+0x10f/0x170 fs/splice.c:933 > splice_direct_to_actor+0x38f/0x990 fs/splice.c:888 > do_splice_direct+0x1b3/0x280 fs/splice.c:976 > do_sendfile+0x553/0x10a0 fs/read_write.c:1257 > __do_sys_sendfile64 fs/read_write.c:1318 [inline] > __se_sys_sendfile64 fs/read_write.c:1304 [inline] > __x64_sys_sendfile64+0x1d0/0x210 fs/read_write.c:1304 > do_syscall_64+0x33/0x40 arch/x86/entry/common.c:46 > entry_SYSCALL_64_after_hwframe+0x61/0xc6 > > It was actually found on a 5.10 kernel instance but I didn't find any > upstream commit referencing something of that kind so this bug is likely > to be in upstream, too. The reproducers unfortunately do not work on my > machine, but I'll add one in the next email for additional info. > > From here the terminology is used as from the fragment of __ip_make_skb(): > --- > if (skb->data < skb_network_header(skb)) > __skb_pull(skb, skb_network_offset(skb)); > while ((tmp_skb = __skb_dequeue(queue)) != NULL) { > __skb_pull(tmp_skb, skb_network_header_len(skb)); <-- BUG is here > --- > > We get the first fragment (called 'skb') from the queue and then start > getting another fragments ('tmp_skb') from the queue and combine them to > the first fragment. > > The problem is that the difference between tmp_skb->len and > tmp_skb->data_len is smaller than the length of skb network (IP) header > and while doing __skb_pull(), tmp_skb->len becomes smaller than > tmp_skb->data_len causing a bug. > > Something is probably wrong with IP header layout of the first fragment > (the SKB to which we are combining another ones). It is 40 bytes long, and > the tmp_skb IP header's length is 20 bytes (have a look at debug info > lower). The first fragment, however, can contain some specific IP options > which are stored only in this fragment and are not included into the > following ones. On the other hand, the problem can be with tmp_skb where > something casued its data_len be incorrect. The assumption in the code that the network header length is the same for all fragments apparently does not hold. ip_append_page reserves head room as follows: fragheaderlen = sizeof(struct iphdr) + (opt ? opt->optlen : 0); [..] skb_reset_network_header(skb); skb->transport_header = (skb->network_header + fragheaderlen); The reproducer you shared makes no explicit system call that modifies optional header length, like IP_OPTIONS. But it opens a PF_INET6/SOCK_DGRAM socket, and yet the stack trace shows an IPv4 stack. The repro shows multiple connect calls. Perhaps it manages to switch between IPv6 and v4-mapped-v6 socket in between sendmsg/sendpage calls? Changing the argument to skb_network_header_len in `__skb_pull(tmp_skb, skb_network_header_len(skb));` might superficially address the out-of-bounds read. But it should probably not be possible to switch between IPv6 and IPv4 while a datagram is being constructed. > Maybe there is some sanity check missing while constructing an IP datagram > in ip_append_page()? Additional check of extra IP headers or MTU > values...? > > We managed to get some debug info about the failing SKBs. > > tmp_skb info: > skb len=1476 headroom=160 headlen=20 tailroom=0 > mac=(-1,-1) net=(160,20) trans=180 > shinfo(txflags=0 nr_frags=1 gso(size=0 type=0 segs=0)) > csum(0x0 ip_summed=0 complete_sw=0 valid=0 level=0) > hash(0xe62f05e3 sw=0 l4=1) proto=0x0000 pkttype=0 iif=0 > sk family=2 type=2 proto=17 > skb linear: 00000000: 00 00 00 00 00 00 00 00 30 06 a9 86 ff ff ff ff > skb linear: 00000010: 02 00 00 00 > skb frag: 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > skb frag: 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > skb frag: 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > skb frag: 00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > skb frag: 00000040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > skb frag: 00000050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > skb frag: 00000060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > skb frag: 00000070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > skb frag: 00000080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > skb frag: 00000090: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > skb frag: 000000a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > skb frag: 000000b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > skb frag: 000000c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > skb frag: 000000d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > skb frag: 000000e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > skb frag: 000000f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > skb frag: 00000100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > skb frag: 00000110: 00 00 00 00 00 00 00 00 00 00 00 00 > > skb info: > skb len=1476 headroom=168 headlen=241 tailroom=0 > mac=(-1,-1) net=(168,40) trans=208 > shinfo(txflags=0 nr_frags=1 gso(size=0 type=0 segs=0)) > csum(0x3f886391 ip_summed=0 complete_sw=0 valid=0 level=0) > hash(0xe62f05e3 sw=0 l4=1) proto=0x86dd pkttype=0 iif=0 > sk family=2 type=2 proto=17 > skb linear: 00000000: 00 00 00 00 00 00 00 00 28 00 00 00 00 00 00 00 > skb linear: 00000010: 80 44 f9 8a ff ff ff ff 00 00 00 00 00 00 00 00 > skb linear: 00000020: 00 00 00 00 00 00 00 00 fe ff ff ff 00 00 00 00 > skb linear: 00000030: f3 57 61 ad ca 6f 38 25 00 62 d5 b1 7d e1 d0 94 > skb linear: 00000040: 04 ae 20 57 20 1a 06 db 10 92 76 4f 8d 2e af 83 > skb linear: 00000050: 91 0c c3 cd b5 d1 96 9e c8 7e c8 e5 90 a9 be aa > skb linear: 00000060: ae f8 7d 1d bb af 99 62 36 3f c9 a3 44 4e 18 fa > skb linear: 00000070: 0e 5f 40 32 59 ad 8b 90 89 df 79 63 13 80 da 2b > skb linear: 00000080: de 47 62 24 61 fd 47 d8 89 4a 74 8a 91 32 aa c6 > skb linear: 00000090: ad 59 30 2f 2c 2e 94 9f 83 00 46 5b f9 11 98 a9 > skb linear: 000000a0: cb ed ca cb 8d 70 d8 78 4c 95 b6 12 af 8b 81 33 > skb linear: 000000b0: 13 58 9d 74 ef 30 91 1d 10 bd 55 22 67 6b b9 43 > skb linear: 000000c0: 97 72 5c a2 c7 24 df f4 2c 3f b8 5e cb 3a 10 f6 > skb linear: 000000d0: 10 5f 3a 11 32 2a d5 22 b2 14 73 c0 1a df b0 6f > skb linear: 000000e0: 3c 98 91 df bf d2 b6 99 ee 3c fe 91 98 f6 55 20 > skb linear: 000000f0: 45 > skb frag: 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > skb frag: 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > skb frag: 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > skb frag: 00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > More info: > tmp_skb->len = 1476; tmp_skb->data_len = 1456 > skb->len = 1476; skb->data_len = 1235 > > I actually found some commits with resembling call trace but they don't help > me that much to solve the issue: > 10b8a3de603d ("ipv6: the entire IPv6 header chain must fit the first fragment") > e9d3f80935b6 ("net/af_packet: make sure to pull mac header") > 501a90c94510 ("inet: protect against too small mtu values.") > > Fedor