Received: by 2002:ab2:2994:0:b0:1ef:ca3e:3cd5 with SMTP id n20csp459090lqb; Thu, 14 Mar 2024 17:08:27 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCW70FxfXnWqLfjkCzRieQT1cIEn7O4tQepaH1FW9fJxz9zJeoT8oQufQEm2AcyKZwfmZmswEydSyWK35H98l4XWA+NYtjenS1wut/hLTw== X-Google-Smtp-Source: AGHT+IGUJN7wSrV/JcVDUMnrURu83v/OA6nKPPkFiMaJ6e9sDW4uOP/44Tz9ApaeDx35d+r932bR X-Received: by 2002:a05:6a21:920f:b0:1a3:4a1a:52b2 with SMTP id tl15-20020a056a21920f00b001a34a1a52b2mr1441537pzb.46.1710461307437; Thu, 14 Mar 2024 17:08:27 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1710461307; cv=pass; d=google.com; s=arc-20160816; b=Kags5uMCa8TG+viggNQGwGuzp2fi0NuYTvmzoont8wnyIXUPiUqWvormTlK8JzO0Cu O6mZ9Ap7SWKEBvjhMm4VRtgXjgjtzGNz4GBdAfuKFGBgV7fEzjmaEx8HiH585aB+pTwQ Wd7HNunSFnwZ9q2Os0iXGPobvSio1Z4Z/L4D9eKFmEd9DqvV9PiVA4ekZly+uQ+sKkzZ BFgNPC5PN3fNCcFL3iyCtE0EPpkD677mnAguAY5FlTp1kTylqCIiul5Mijdlgboa5i2h zFAPTheJcRY0Thb9Muxa8Nbt4OQ+Eo4Y5oml4JYJX2s83k34UQ/H1tJ35+//XNXRZayC D7hw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=user-agent:in-reply-to:content-disposition:mime-version :list-unsubscribe:list-subscribe:list-id:precedence:references :message-id:subject:cc:to:from:date; bh=sBtVCwiSCJSWeJfdY2vlVSlrWHDu4aUTrfJx8Ot4qro=; fh=uGnWcIUj9S+KQgYlhwZk+OEQUP1flvNSllK62iA3qAE=; b=W5oXuozmVMeyjBPOzmToR1jOIgJSvMleqAuStig6EjAEsS06g8hyRp/1JadTBiftHN 44bvBP2w31aI3gD8kt54YtQSZprFJO/W7EUnfc595v1qBRr4ObWR0C+Q6v9KBdwlN/Aw AYQ9tro4+VdlU4/SKLvPNF1ZQNMan9K4Oi46jFYU68V45E3TrNQ+DvWQnzl98YOMIiUY /h404lwJPtnhBnGRBO+FVOcPSLpG73vNZLo2TfndDQu4U6dEYhgxMA+9jJKJE+WkNWbs bf+l7mnYsgZwHaRV5xnln4bQIS4R+4TZb38NuNKitTvRIMhiV7ZIUFpbpM88gRmoQeUo 6wHg==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1 spf=pass spfdomain=strlen.de); spf=pass (google.com: domain of linux-kernel+bounces-103930-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-103930-linux.lists.archive=gmail.com@vger.kernel.org" Return-Path: Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [2604:1380:45e3:2400::1]) by mx.google.com with ESMTPS id jc29-20020a056a006c9d00b006e6b8f53512si2535458pfb.281.2024.03.14.17.08.27 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 14 Mar 2024 17:08:27 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-103930-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) client-ip=2604:1380:45e3:2400::1; Authentication-Results: mx.google.com; arc=pass (i=1 spf=pass spfdomain=strlen.de); spf=pass (google.com: domain of linux-kernel+bounces-103930-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-103930-linux.lists.archive=gmail.com@vger.kernel.org" Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id 14064282C2F for ; Fri, 15 Mar 2024 00:08:27 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 08D8A10F4; Fri, 15 Mar 2024 00:08:16 +0000 (UTC) Received: from Chamillionaire.breakpoint.cc (Chamillionaire.breakpoint.cc [91.216.245.30]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9637C635; Fri, 15 Mar 2024 00:08:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.216.245.30 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710461295; cv=none; b=A1T04VZpGP+IWfIvgw/3F8y+MmRJ8GdHnrw1Of1E/9KYcOrMYGLI2syySx4lKdS65oNpx+8zYwj5abrG9EjdKlDGYgzPY/TKW44+MIEiF7rKJ4mnOF2N2mQefqH/Xt9M9vcB5WebKwh7KX8/ClV0Qh9UeNRHyRxnlLqe+ychy7U= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710461295; c=relaxed/simple; bh=eYmW3Ftf7mefPPsE9pr/d71AHAAKr268pFGEIArfBlQ=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=seXNJ05Ru4yKOBoz1RBxsf/XyL2lN4iQiY9+7hjLe3cFM4LcigrHPu47o8SA0/ClEhW5uKUGfTe0B3cNfbqM+D8DBlGZAQr0WaI45v5t1Cy8BXCvUB9dhkAHoUpTzzsRRhMdAqNfR7E21cGiTZF2GewUVhGwe7AMyT70h7+tDSE= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=strlen.de; spf=pass smtp.mailfrom=strlen.de; arc=none smtp.client-ip=91.216.245.30 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=strlen.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=strlen.de Received: from fw by Chamillionaire.breakpoint.cc with local (Exim 4.92) (envelope-from ) id 1rkv78-00029A-9V; Fri, 15 Mar 2024 01:07:58 +0100 Date: Fri, 15 Mar 2024 01:07:58 +0100 From: Florian Westphal To: Florian Westphal Cc: Eric Dumazet , xingwei lee , pabeni@redhat.com, davem@davemloft.net, kuba@kernel.org, linux-hams@vger.kernel.org, linux-kernel@vger.kernel.org, netdev@vger.kernel.org, ralf@linux-mips.org, syzkaller-bugs@googlegroups.com, samsun1006219@gmail.com Subject: Re: KASAN: slab-use-after-free Read in ip_finish_output Message-ID: <20240315000758.GA30611@breakpoint.cc> References: <20240306103632.GC4420@breakpoint.cc> <20240312132107.GA1529@breakpoint.cc> <20240314112650.GE1038@breakpoint.cc> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20240314112650.GE1038@breakpoint.cc> User-Agent: Mutt/1.10.1 (2018-07-13) Florian Westphal wrote: > Eric Dumazet wrote: > > Thanks for taking a look Florian. > > > > Perhaps not messing with truesize at all would help ? > > > > Something based on this POC : > > > > spin_lock(&qp->q.lock); > > + if (!qp->q.sk) { > > + struct sock *sk = skb->sk; > > > > + if (sk && refcount_inc_not_zero(&sk->sk_refcnt)) > > + qp->q.sk = sk; > > Unfortunetely I did not get this to work. > > sk_refcnt is 0. sk is kept alive by sock_wfree destructor. > > I don't know how to recover from this, refcnt cannot be "repaired" > anymore. > > I could artificially inflate sk_wmem counter by 1, to prevent release, > but that needs yet another sock_wfree-like destructor. > > I'm already not sure how this existing scheme works, there are > multiple places that check for skb->destructor == sock_wfree, > yet we have is_skb_wmem helper that checks __sock_wfree and tcp_wfree. > > Removing defrag from output seems like best option, but it will > surely break some scenarios. > > Or, I could just fail reasm if sk refcount is already 0, that would > likely work too? I am going to test (and formally submit if no issues found): inet: inet_defrag: prevent sk release while still in use diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 3023bc2be6a1..b1aeed510f24 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -753,8 +753,6 @@ typedef unsigned char *sk_buff_data_t; * @list: queue head * @ll_node: anchor in an llist (eg socket defer_list) * @sk: Socket we are owned by - * @ip_defrag_offset: (aka @sk) alternate use of @sk, used in - * fragmentation management * @dev: Device we arrived on/are leaving by * @dev_scratch: (aka @dev) alternate use of @dev when @dev would be %NULL * @cb: Control buffer. Free for use by every layer. Put private vars here @@ -875,10 +873,7 @@ struct sk_buff { struct llist_node ll_node; }; - union { - struct sock *sk; - int ip_defrag_offset; - }; + struct sock *sk; union { ktime_t tstamp; diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c index 7072fc0783ef..fab1ea5df089 100644 --- a/net/ipv4/inet_fragment.c +++ b/net/ipv4/inet_fragment.c @@ -39,6 +39,7 @@ struct ipfrag_skb_cb { }; struct sk_buff *next_frag; int frag_run_len; + int ip_defrag_offset; }; #define FRAG_CB(skb) ((struct ipfrag_skb_cb *)((skb)->cb)) @@ -396,12 +397,12 @@ int inet_frag_queue_insert(struct inet_frag_queue *q, struct sk_buff *skb, */ if (!last) fragrun_create(q, skb); /* First fragment. */ - else if (last->ip_defrag_offset + last->len < end) { + else if (FRAG_CB(last)->ip_defrag_offset + last->len < end) { /* This is the common case: skb goes to the end. */ /* Detect and discard overlaps. */ - if (offset < last->ip_defrag_offset + last->len) + if (offset < FRAG_CB(last)->ip_defrag_offset + last->len) return IPFRAG_OVERLAP; - if (offset == last->ip_defrag_offset + last->len) + if (offset == FRAG_CB(last)->ip_defrag_offset + last->len) fragrun_append_to_last(q, skb); else fragrun_create(q, skb); @@ -418,13 +419,13 @@ int inet_frag_queue_insert(struct inet_frag_queue *q, struct sk_buff *skb, parent = *rbn; curr = rb_to_skb(parent); - curr_run_end = curr->ip_defrag_offset + + curr_run_end = FRAG_CB(curr)->ip_defrag_offset + FRAG_CB(curr)->frag_run_len; - if (end <= curr->ip_defrag_offset) + if (end <= FRAG_CB(curr)->ip_defrag_offset) rbn = &parent->rb_left; else if (offset >= curr_run_end) rbn = &parent->rb_right; - else if (offset >= curr->ip_defrag_offset && + else if (offset >= FRAG_CB(curr)->ip_defrag_offset && end <= curr_run_end) return IPFRAG_DUP; else @@ -438,7 +439,7 @@ int inet_frag_queue_insert(struct inet_frag_queue *q, struct sk_buff *skb, rb_insert_color(&skb->rbnode, &q->rb_fragments); } - skb->ip_defrag_offset = offset; + FRAG_CB(skb)->ip_defrag_offset = offset; return IPFRAG_OK; } @@ -448,13 +449,29 @@ void *inet_frag_reasm_prepare(struct inet_frag_queue *q, struct sk_buff *skb, struct sk_buff *parent) { struct sk_buff *fp, *head = skb_rb_first(&q->rb_fragments); - struct sk_buff **nextp; + struct sk_buff **nextp = NULL; + struct sock *sk = skb->sk; int delta; + if (sk) { + if (!refcount_inc_not_zero(&sk->sk_refcnt)) + return NULL; + + /* need to orphan now, both this function and + * inet_frag_reasm_finish alter skb->truesize, which + * messes up sk->sk_wmem accounting. + * skb->sk might have been passed as argument to dst->output + * and must remain valid until tx completes. + */ + skb_orphan(skb); + } + if (head != skb) { fp = skb_clone(skb, GFP_ATOMIC); - if (!fp) - return NULL; + if (!fp) { + head = skb; + goto out_restore_sk; + } FRAG_CB(fp)->next_frag = FRAG_CB(skb)->next_frag; if (RB_EMPTY_NODE(&skb->rbnode)) FRAG_CB(parent)->next_frag = fp; @@ -470,13 +487,14 @@ void *inet_frag_reasm_prepare(struct inet_frag_queue *q, struct sk_buff *skb, consume_skb(head); head = skb; } - WARN_ON(head->ip_defrag_offset != 0); + WARN_ON(FRAG_CB(head)->ip_defrag_offset != 0); + DEBUG_NET_WARN_ON_ONCE(head->sk); delta = -head->truesize; /* Head of list must not be cloned. */ if (skb_unclone(head, GFP_ATOMIC)) - return NULL; + goto out_restore_sk; delta += head->truesize; if (delta) @@ -492,7 +510,7 @@ void *inet_frag_reasm_prepare(struct inet_frag_queue *q, struct sk_buff *skb, clone = alloc_skb(0, GFP_ATOMIC); if (!clone) - return NULL; + goto out_restore_sk; skb_shinfo(clone)->frag_list = skb_shinfo(head)->frag_list; skb_frag_list_init(head); for (i = 0; i < skb_shinfo(head)->nr_frags; i++) @@ -509,6 +527,12 @@ void *inet_frag_reasm_prepare(struct inet_frag_queue *q, struct sk_buff *skb, nextp = &skb_shinfo(head)->frag_list; } +out_restore_sk: + if (sk) { + head->sk = sk; + head->destructor = sock_edemux; + } + return nextp; } EXPORT_SYMBOL(inet_frag_reasm_prepare); diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c index a4941f53b523..fb947d1613fe 100644 --- a/net/ipv4/ip_fragment.c +++ b/net/ipv4/ip_fragment.c @@ -384,6 +384,7 @@ static int ip_frag_queue(struct ipq *qp, struct sk_buff *skb) } skb_dst_drop(skb); + skb_orphan(skb); return -EINPROGRESS; insert_error: @@ -487,7 +488,6 @@ int ip_defrag(struct net *net, struct sk_buff *skb, u32 user) struct ipq *qp; __IP_INC_STATS(net, IPSTATS_MIB_REASMREQDS); - skb_orphan(skb); /* Lookup (or create) queue header */ qp = ip_find(net, ip_hdr(skb), user, vif); diff --git a/net/ipv6/netfilter/nf_conntrack_reasm.c b/net/ipv6/netfilter/nf_conntrack_reasm.c index 1a51a44571c3..d0dcbaca1994 100644 --- a/net/ipv6/netfilter/nf_conntrack_reasm.c +++ b/net/ipv6/netfilter/nf_conntrack_reasm.c @@ -294,6 +294,7 @@ static int nf_ct_frag6_queue(struct frag_queue *fq, struct sk_buff *skb, } skb_dst_drop(skb); + skb_orphan(skb); return -EINPROGRESS; insert_error: @@ -469,7 +470,6 @@ int nf_ct_frag6_gather(struct net *net, struct sk_buff *skb, u32 user) hdr = ipv6_hdr(skb); fhdr = (struct frag_hdr *)skb_transport_header(skb); - skb_orphan(skb); fq = fq_find(net, fhdr->identification, user, hdr, skb->dev ? skb->dev->ifindex : 0); if (fq == NULL) {