Received: by 2002:a05:6a10:6d10:0:0:0:0 with SMTP id gq16csp812302pxb; Tue, 12 Apr 2022 14:08:07 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzA2rhSTEdM62HY6rFzsfCLxZ3ges8fgwOtbTmMUqtzTjP5s7rX/Ts3lsCmu8cYAXV3AZ9I X-Received: by 2002:a05:6a00:846:b0:4fb:3b79:fc94 with SMTP id q6-20020a056a00084600b004fb3b79fc94mr39646017pfk.76.1649797687184; Tue, 12 Apr 2022 14:08:07 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1649797687; cv=none; d=google.com; s=arc-20160816; b=hplshkGwUKXucnm79VnfYXN/kKefDqwRybIXHfHE9MeG64BJ0NP1auc2hDUIFvoqPr +aGVlJDoBoZg2da6N/6d6qt3NwAPl1ENLZBut9oLAocVQl741ZrMCUyhk+mKE+pv4hkt BvorQs7zAoL8/WHBnjbgW9hVM23ANvKBayEi2aTNvPaZW0lnzExR30KFe/EOr+xWecdL N1p4Ixg7Vjm/dZ50sAbBpp2potj37UM79LUxtKhpfTKDwuEADZxce4w8Wmng3ffxZnt5 KrTb1J4vH6+G/vQXAOjk06f7fPp7C9Nb7JIUCjux6z0s/sK7errduhs7HTYaopZKGums cXCQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=Ko540nWU1yPY4qIUtVol05QS+gNwThGJryQ+G3wDpUM=; b=OABhFVXCLKncoRV5Zh3D3KlgjQYk9ipHhIRWKpJzT6yNW5t9Ev1ErZW+9x3LEx+64J JY2OlMkp1r+x9QGf3gXydtRz/46Vgr+baqNe3YiJctPvj9mWyRspPwpp9mlFsCzY4IpI YBgawGp0zjaBUPJAcFOav9H7OOUQkZsL5EGv/RwgkDRSR1lciLgH0StEVj1xWo2N6mJk Y4AVlwWPAVKBbC5hAKi1Wz94H8zoUVXPR3I63dIi3fPeB210CtCYDr7JSSn4B3X4AI0i FnCLuoEY2ErzDyX/DagMcGyTnpamoBYAWSO+mL72QVIH6Z7yE/LsK0b2H6Fu4u6gfSa0 xqaA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=QGNZPbD2; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [2620:137:e000::1:18]) by mx.google.com with ESMTPS id h8-20020a170902704800b00153b2d16430si2289272plt.56.2022.04.12.14.08.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 Apr 2022 14:08:07 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) client-ip=2620:137:e000::1:18; Authentication-Results: mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=QGNZPbD2; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 167D8DE08D; Tue, 12 Apr 2022 13:25:25 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1353796AbiDLI0Z (ORCPT + 99 others); Tue, 12 Apr 2022 04:26:25 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60926 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1353667AbiDLHZv (ORCPT ); Tue, 12 Apr 2022 03:25:51 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8549626AEB; Tue, 12 Apr 2022 00:03:10 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 2106960B65; Tue, 12 Apr 2022 07:03:10 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 2C440C385A6; Tue, 12 Apr 2022 07:03:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linuxfoundation.org; s=korg; t=1649746989; bh=B7KE0Bwe2HzZ6ljezbk4G1JK+oI+ZuzUzmoWONuuw8g=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=QGNZPbD2xdXFRhvKIRrvAVjvH3z8c8Y/0Q249/C+18gyLfSUbY81WXDspzqRKSmWS iUDP1zAav+rGHwta7iCb2GbLCwnO3s7/0z/ymT557eGb6q+yZOzsehtzqJEl8JHWmJ tCKRkAWASGVKjhMRoouPTMR3TYHSzgud7GRRHJ/g= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Alexander Duyck , Jean-Philippe Brucker , Yunsheng Lin , Ilias Apalodimas , Jesper Dangaard Brouer , "David S. Miller" , Sasha Levin Subject: [PATCH 5.16 163/285] skbuff: fix coalescing for page_pool fragment recycling Date: Tue, 12 Apr 2022 08:30:20 +0200 Message-Id: <20220412062948.374730117@linuxfoundation.org> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220412062943.670770901@linuxfoundation.org> References: <20220412062943.670770901@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RDNS_NONE,SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Jean-Philippe Brucker [ Upstream commit 1effe8ca4e34c34cdd9318436a4232dcb582ebf4 ] Fix a use-after-free when using page_pool with page fragments. We encountered this problem during normal RX in the hns3 driver: (1) Initially we have three descriptors in the RX queue. The first one allocates PAGE1 through page_pool, and the other two allocate one half of PAGE2 each. Page references look like this: RX_BD1 _______ PAGE1 RX_BD2 _______ PAGE2 RX_BD3 _________/ (2) Handle RX on the first descriptor. Allocate SKB1, eventually added to the receive queue by tcp_queue_rcv(). (3) Handle RX on the second descriptor. Allocate SKB2 and pass it to netif_receive_skb(): netif_receive_skb(SKB2) ip_rcv(SKB2) SKB3 = skb_clone(SKB2) SKB2 and SKB3 share a reference to PAGE2 through skb_shinfo()->dataref. The other ref to PAGE2 is still held by RX_BD3: SKB2 ---+- PAGE2 SKB3 __/ / RX_BD3 _________/ (3b) Now while handling TCP, coalesce SKB3 with SKB1: tcp_v4_rcv(SKB3) tcp_try_coalesce(to=SKB1, from=SKB3) // succeeds kfree_skb_partial(SKB3) skb_release_data(SKB3) // drops one dataref SKB1 _____ PAGE1 \____ SKB2 _____ PAGE2 / RX_BD3 _________/ In skb_try_coalesce(), __skb_frag_ref() takes a page reference to PAGE2, where it should instead have increased the page_pool frag reference, pp_frag_count. Without coalescing, when releasing both SKB2 and SKB3, a single reference to PAGE2 would be dropped. Now when releasing SKB1 and SKB2, two references to PAGE2 will be dropped, resulting in underflow. (3c) Drop SKB2: af_packet_rcv(SKB2) consume_skb(SKB2) skb_release_data(SKB2) // drops second dataref page_pool_return_skb_page(PAGE2) // drops one pp_frag_count SKB1 _____ PAGE1 \____ PAGE2 / RX_BD3 _________/ (4) Userspace calls recvmsg() Copies SKB1 and releases it. Since SKB3 was coalesced with SKB1, we release the SKB3 page as well: tcp_eat_recv_skb(SKB1) skb_release_data(SKB1) page_pool_return_skb_page(PAGE1) page_pool_return_skb_page(PAGE2) // drops second pp_frag_count (5) PAGE2 is freed, but the third RX descriptor was still using it! In our case this causes IOMMU faults, but it would silently corrupt memory if the IOMMU was disabled. Change the logic that checks whether pp_recycle SKBs can be coalesced. We still reject differing pp_recycle between 'from' and 'to' SKBs, but in order to avoid the situation described above, we also reject coalescing when both 'from' and 'to' are pp_recycled and 'from' is cloned. The new logic allows coalescing a cloned pp_recycle SKB into a page refcounted one, because in this case the release (4) will drop the right reference, the one taken by skb_try_coalesce(). Fixes: 53e0961da1c7 ("page_pool: add frag page recycling support in page pool") Suggested-by: Alexander Duyck Signed-off-by: Jean-Philippe Brucker Reviewed-by: Yunsheng Lin Reviewed-by: Alexander Duyck Acked-by: Ilias Apalodimas Acked-by: Jesper Dangaard Brouer Signed-off-by: David S. Miller Signed-off-by: Sasha Levin --- net/core/skbuff.c | 15 +++++++++++---- 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/net/core/skbuff.c b/net/core/skbuff.c index fdd804120600..001152c8def9 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -5369,11 +5369,18 @@ bool skb_try_coalesce(struct sk_buff *to, struct sk_buff *from, if (skb_cloned(to)) return false; - /* The page pool signature of struct page will eventually figure out - * which pages can be recycled or not but for now let's prohibit slab - * allocated and page_pool allocated SKBs from being coalesced. + /* In general, avoid mixing slab allocated and page_pool allocated + * pages within the same SKB. However when @to is not pp_recycle and + * @from is cloned, we can transition frag pages from page_pool to + * reference counted. + * + * On the other hand, don't allow coalescing two pp_recycle SKBs if + * @from is cloned, in case the SKB is using page_pool fragment + * references (PP_FLAG_PAGE_FRAG). Since we only take full page + * references for cloned SKBs at the moment that would result in + * inconsistent reference counts. */ - if (to->pp_recycle != from->pp_recycle) + if (to->pp_recycle != (from->pp_recycle && !skb_cloned(from))) return false; if (len <= skb_tailroom(to)) { -- 2.35.1