Received: by 2002:a05:6358:4e97:b0:b3:742d:4702 with SMTP id ce23csp2304626rwb; Mon, 15 Aug 2022 02:59:41 -0700 (PDT) X-Google-Smtp-Source: AA6agR4xXkyUs80ZuZNQUebE+NhahoHt7ym8wHQxC4I3fNnJILDazXjhBc7FREStGxcxK/8Qfxh1 X-Received: by 2002:a17:907:8a0a:b0:730:a118:75de with SMTP id sc10-20020a1709078a0a00b00730a11875demr10325623ejc.189.1660557581063; Mon, 15 Aug 2022 02:59:41 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1660557581; cv=none; d=google.com; s=arc-20160816; b=HzL8qS240lK+uoZ7W08KjJLNtd8346dzEM8Y+GFLIt/+NTQFP73xuFbKJn+2Yt7S8s hIGccyPL8voFPYCG8HZD/PVKh2Opnz38CV3cJEDvIHpUp/YA3k+A6uZeW1TlUJiFsdC7 dTzeKVGt+dObJureQmiBIHsEJ0TJRDaNn0KvxATNdIzJ4wu1cSVXHsr/BsE8uclSKJ1q 7E+zITaSY7MHI5UFATzNNVcMKD5UAGYWghkXeoTrGGeriQM8wmgfciAnfQZpIA2PdTB/ WmoEp+9PSdrm+DwN+e1EoLFxOdiDrWHVKImosGW3Hyba3UHO8N+xfSFObpfu2y+ympcH a+5g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=aci8AIuDT5JVsC6JPu8DTEr1urQqgkWWIXp400OOFtE=; b=RWQtUohzEHz0IG9+117XjmdK3wreaeVcj7KUzrJUEA8SmY0vtqSVydbP+3uH2URkAY vOj6YxbM3WDQZqixQEab8J4aW0rug03eJ7by585PS5zCce1FVSmw3tyMia6ebi5YqYMS BkS7QV3iFyqx8pIo+tYSbIkOniPqLaGq5LNk9Io53Xv5wRbxpYtJoATrberMREo6cp5h 1bW7W8yUd5orKwmeASxTu+F8+/rPyHTjinXGelj4YXQyo+CMWOznTEY8byALhYu2fimz GAPqceTR/Zb0l+vQPxhnRDQdQhIBSP+UbGfxOqDtkR8eweLznUgKLmXnO+njWGDuANpM 8DIg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=R2Ofi4zy; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id l14-20020a170906794e00b0073111225ba5si7529595ejo.743.2022.08.15.02.59.15; Mon, 15 Aug 2022 02:59:41 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=R2Ofi4zy; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S242157AbiHOJon (ORCPT + 99 others); Mon, 15 Aug 2022 05:44:43 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36448 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S242142AbiHOJol (ORCPT ); Mon, 15 Aug 2022 05:44:41 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2148E2253A; Mon, 15 Aug 2022 02:44:40 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id A8B7261026; Mon, 15 Aug 2022 09:44:39 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 82466C433C1; Mon, 15 Aug 2022 09:44:35 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1660556679; bh=QaGBzqhn0ZD/7Zc89Ov2cg1W44KZCJ7lb2WJY5GCRe0=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=R2Ofi4zyad+oWz3HSL8eqIZkRZa1q6PYP3byrgrIvNeESt3UDcRvypPVdxOIa0rLu DJfDWXiIXk+lF+2z8fMZbipcFDoqefOkINi/1sKBUey2Ymdx3AYmxXsJ6A0tEfmY6o MGL2DWhpT+5/Rwrj3yZTlmHwSjFtDlF+RXnSbC7SDlhUp7ccqHs1/cifyQQ0V3V6T4 eUo1MzxbSFxwVkhE4lY/UCmKW1kbznB+YilWQ6Ot1hD/hxHDXIY4AI+8oMmFYjq+tv pEAxBC0eHySIMaUe2GNV6CCNmyd4r/vsF9MXNBW3H411HH5kEMvFGBcYFu4hGu/iDP av0IFTplRP1SA== Date: Mon, 15 Aug 2022 11:44:32 +0200 From: Christian Brauner To: Alexander Mikhalitsyn Cc: netdev@vger.kernel.org, "Denis V. Lunev" , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Daniel Borkmann , David Ahern , Yajun Deng , Roopa Prabhu , linux-kernel@vger.kernel.org, Alexey Kuznetsov , Konstantin Khorenko , kernel@openvz.org, devel@openvz.org Subject: Re: [PATCH v2 1/2] neigh: fix possible DoS due to net iface start/stop loop Message-ID: <20220815094432.tdqdfh3pwcfekegg@wittgenstein> References: <20220729103559.215140-1-alexander.mikhalitsyn@virtuozzo.com> <20220810160840.311628-1-alexander.mikhalitsyn@virtuozzo.com> <20220810160840.311628-2-alexander.mikhalitsyn@virtuozzo.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20220810160840.311628-2-alexander.mikhalitsyn@virtuozzo.com> X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Aug 10, 2022 at 07:08:39PM +0300, Alexander Mikhalitsyn wrote: > From: "Denis V. Lunev" > > Normal processing of ARP request (usually this is Ethernet broadcast > packet) coming to the host is looking like the following: > * the packet comes to arp_process() call and is passed through routing > procedure > * the request is put into the queue using pneigh_enqueue() if > corresponding ARP record is not local (common case for container > records on the host) > * the request is processed by timer (within 80 jiffies by default) and > ARP reply is sent from the same arp_process() using > NEIGH_CB(skb)->flags & LOCALLY_ENQUEUED condition (flag is set inside > pneigh_enqueue()) > > And here the problem comes. Linux kernel calls pneigh_queue_purge() > which destroys the whole queue of ARP requests on ANY network interface > start/stop event through __neigh_ifdown(). > > This is actually not a problem within the original world as network > interface start/stop was accessible to the host 'root' only, which > could do more destructive things. But the world is changed and there > are Linux containers available. Here container 'root' has an access > to this API and could be considered as untrusted user in the hosting > (container's) world. > > Thus there is an attack vector to other containers on node when > container's root will endlessly start/stop interfaces. We have observed > similar situation on a real production node when docker container was > doing such activity and thus other containers on the node become not > accessible. > > The patch proposed doing very simple thing. It drops only packets from > the same namespace in the pneigh_queue_purge() where network interface > state change is detected. This is enough to prevent the problem for the > whole node preserving original semantics of the code. This is how I'd do it as well. > > v2: > - do del_timer_sync() if queue is empty after pneigh_queue_purge() > > Cc: "David S. Miller" > Cc: Eric Dumazet > Cc: Jakub Kicinski > Cc: Paolo Abeni > Cc: Daniel Borkmann > Cc: David Ahern > Cc: Yajun Deng > Cc: Roopa Prabhu > Cc: Christian Brauner > Cc: netdev@vger.kernel.org > Cc: linux-kernel@vger.kernel.org > Cc: Alexey Kuznetsov > Cc: Alexander Mikhalitsyn > Cc: Konstantin Khorenko > Cc: kernel@openvz.org > Cc: devel@openvz.org > Investigated-by: Alexander Mikhalitsyn > Signed-off-by: Denis V. Lunev > --- > net/core/neighbour.c | 25 +++++++++++++++++-------- > 1 file changed, 17 insertions(+), 8 deletions(-) > > diff --git a/net/core/neighbour.c b/net/core/neighbour.c > index 54625287ee5b..19d99d1eff53 100644 > --- a/net/core/neighbour.c > +++ b/net/core/neighbour.c > @@ -307,14 +307,23 @@ static int neigh_del_timer(struct neighbour *n) > return 0; > } > > -static void pneigh_queue_purge(struct sk_buff_head *list) > +static void pneigh_queue_purge(struct sk_buff_head *list, struct net *net) > { > + unsigned long flags; > struct sk_buff *skb; > > - while ((skb = skb_dequeue(list)) != NULL) { > - dev_put(skb->dev); > - kfree_skb(skb); > + spin_lock_irqsave(&list->lock, flags); I'm a bit surprised to see a spinlock held around a while loop walking a linked list but that seems to be quite common in this file. I take it the lists are guaranteed to be short. > + skb = skb_peek(list); > + while (skb != NULL) { > + struct sk_buff *skb_next = skb_peek_next(skb, list); > + if (net == NULL || net_eq(dev_net(skb->dev), net)) { > + __skb_unlink(skb, list); > + dev_put(skb->dev); > + kfree_skb(skb); > + } > + skb = skb_next; > } > + spin_unlock_irqrestore(&list->lock, flags); > } > > static void neigh_flush_dev(struct neigh_table *tbl, struct net_device *dev, > @@ -385,9 +394,9 @@ static int __neigh_ifdown(struct neigh_table *tbl, struct net_device *dev, > write_lock_bh(&tbl->lock); > neigh_flush_dev(tbl, dev, skip_perm); > pneigh_ifdown_and_unlock(tbl, dev); > - > - del_timer_sync(&tbl->proxy_timer); > - pneigh_queue_purge(&tbl->proxy_queue); > + pneigh_queue_purge(&tbl->proxy_queue, dev_net(dev)); > + if (skb_queue_empty_lockless(&tbl->proxy_queue)) > + del_timer_sync(&tbl->proxy_timer); > return 0; > } > > @@ -1787,7 +1796,7 @@ int neigh_table_clear(int index, struct neigh_table *tbl) > cancel_delayed_work_sync(&tbl->managed_work); > cancel_delayed_work_sync(&tbl->gc_work); > del_timer_sync(&tbl->proxy_timer); > - pneigh_queue_purge(&tbl->proxy_queue); > + pneigh_queue_purge(&tbl->proxy_queue, NULL); > neigh_ifdown(tbl, NULL); > if (atomic_read(&tbl->entries)) > pr_crit("neighbour leakage\n"); > -- > 2.36.1 >