Received: by 2002:a05:6359:322:b0:b3:69d0:12d8 with SMTP id ef34csp365529rwb; Wed, 10 Aug 2022 09:25:59 -0700 (PDT) X-Google-Smtp-Source: AA6agR74EGgy4VF2ut32lH3J1kOnrgNbxwFbOC6yxTOeWxpYFVHKbjP/sXkOzviaNJne5MbbHPUP X-Received: by 2002:a17:903:240e:b0:170:d82a:25c7 with SMTP id e14-20020a170903240e00b00170d82a25c7mr8985290plo.126.1660148748901; Wed, 10 Aug 2022 09:25:48 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1660148748; cv=none; d=google.com; s=arc-20160816; b=fM23A0dfeh2XSz+ke2Cbjdo86+foAbz/IU+MFrffF05RKGycpSc0NqNPQ/512VVraJ /6eSMwJhlB8xrR/o3jtDBub8doplv7rdSu8NatXTpO5gpKjpIXYPbtwgZGMdr6vADuQM igQKtXpzmf84IVSQYRtH31lhugjTE/BeUg1n3Uik75iRGdMELO7aSvHzFsHHfp6DA0EO jEytplourvAM0cK2GTMaIhzv2xDuDKjtoYtq6Z77ys8EZXZ8x9hlEfPUn9r7DgiQoWVB 12WDggG0C81MlYITcGXJKq0a5URT+Zml6VXMTbAG85z4zfEkFxIoIP2ta2AL608GzhT0 8ZYQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=iewQZjptpRprzyo8+IXGD2c/HRpa8cDlnH9BP1euYuU=; b=DUz/jV4F3FXd8/wxAWht3uqhYViC1ZnE/YZV1vaxcMbmA9jMS3Fto+Ud76HcLreB5P szop26bhNtNB701GRepEoDK246HFXVwMnM1X9ELL6UqRGMkQPi9mYay33L9tUvhhhulo LCGqePs/twl7Ju9B7oHzK/ZYkiDDAku3UG88an6or3pwFvf/jDYXI0EadSenZOchmejz B0waQOki/xzdGGPEP3UUUVLGbxVfRCwSfGCW4SYg3UPeS19wArmMGtZHVrUwu6noWum9 U43bYUH4U7nd8NpAMu/VaudG+NmOFJgFDuxpXGsvWveTGb3dEAQyp3Ys1/kBHefZ+QZG BeDg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=virtuozzo.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id s137-20020a632c8f000000b0041da5a05f4csi6889541pgs.526.2022.08.10.09.25.32; Wed, 10 Aug 2022 09:25:48 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=virtuozzo.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233094AbiHJQKQ (ORCPT + 99 others); Wed, 10 Aug 2022 12:10:16 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37850 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232852AbiHJQKI (ORCPT ); Wed, 10 Aug 2022 12:10:08 -0400 Received: from relay.virtuozzo.com (relay.virtuozzo.com [130.117.225.111]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E312D7A527; Wed, 10 Aug 2022 09:10:06 -0700 (PDT) Received: from dev010.ch-qa.sw.ru ([172.29.1.15]) by relay.virtuozzo.com with esmtp (Exim 4.95) (envelope-from ) id 1oLoF9-00F6Pf-It; Wed, 10 Aug 2022 18:08:38 +0200 From: Alexander Mikhalitsyn To: netdev@vger.kernel.org Cc: Alexander Mikhalitsyn , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Daniel Borkmann , David Ahern , Yajun Deng , Roopa Prabhu , Christian Brauner , linux-kernel@vger.kernel.org, "Denis V . Lunev" , Alexey Kuznetsov , Konstantin Khorenko , Pavel Tikhomirov , Andrey Zhadchenko , Alexander Mikhalitsyn , kernel@openvz.org, devel@openvz.org Subject: [PATCH v2 0/2] neighbour: fix possible DoS due to net iface start/stop loop Date: Wed, 10 Aug 2022 19:08:38 +0300 Message-Id: <20220810160840.311628-1-alexander.mikhalitsyn@virtuozzo.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20220729103559.215140-1-alexander.mikhalitsyn@virtuozzo.com> References: <20220729103559.215140-1-alexander.mikhalitsyn@virtuozzo.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,SPF_HELO_NONE, SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Dear friends, Recently one of OpenVZ users reported that they have issues with network availability of some containers. It was discovered that the reason is absence of ARP replies from the Host Node on the requests about container IPs. Of course, we started from tcpdump analysis and noticed that ARP requests successfuly comes to the problematic node external interface. So, something was wrong from the kernel side. I've played a lot with arping and perf in attempts to understand what's happening. And the key observation was that we experiencing issues only with ARP requests with broadcast source ip (skb->pkt_type == PACKET_BROADCAST). But for packets skb->pkt_type == PACKET_HOST everything works flawlessly. Let me show a small piece of code: static int arp_process(struct sock *sk, struct sk_buff *skb) ... if (NEIGH_CB(skb)->flags & LOCALLY_ENQUEUED || skb->pkt_type == PACKET_HOST || NEIGH_VAR(in_dev->arp_parms, PROXY_DELAY) == 0) { // reply instantly arp_send_dst(ARPOP_REPLY, ETH_P_ARP, sip, dev, tip, sha, dev->dev_addr, sha, reply_dst); } else { pneigh_enqueue(&arp_tbl, // reply with delay in_dev->arp_parms, skb); goto out_free_dst; } The problem was that for PACKET_BROADCAST packets we delaying replies and use pneigh_enqueue() function. For some reason, queued packets were lost almost all the time! The reason for such behaviour is pneigh_queue_purge() function which cleanups all the queue, and this function called everytime once some network device in the system gets link down. neigh_ifdown -> pneigh_queue_purge Now imagine that we have a node with 500+ containers with microservices. And some of that microservices are buggy and always restarting... in this case, pneigh_queue_purge function will be called very frequently. This problem is reproducible only with so-called "host routed" setup. The classical scheme bridge + veth is not affected. Minimal reproducer Suppose that we have a network 172.29.1.1/16 brd 172.29.255.255 and we have free-to-use IP, let it be 172.29.128.3 1. Network configuration. I showing the minimal configuration, it makes no sense as we have both veth devices stay at the same net namespace, but for demonstation and simplicity sake it's okay. ip l a veth31427 type veth peer name veth314271 ip l s veth31427 up ip l s veth314271 up # setup static arp entry and publish it arp -Ds -i br0 172.29.128.3 veth31427 pub # setup static route for this address route add 172.29.128.3/32 dev veth31427 2. "attacker" side (kubernetes pod with buggy microservice :) ) unshare -n ip l a type veth ip l s veth0 up ip l s veth1 up for i in {1..100000}; do ip link set veth0 down; sleep 0.01; ip link set veth0 up; done This will totaly block ARP replies for 172.29.128.3 address. Just try # arping -I eth0 172.29.128.3 -c 4 Our proposal is simple: 1. Let's cleanup queue partially. Remove only skb's that related to the net namespace of the adapter which link is down. 2. Let's account proxy_queue limit properly per-device. Current limitation looks not fully correct because we comparing per-device configurable limit with the "global" qlen of proxy_queue. Thanks, Alex v2: - only ("neigh: fix possible DoS due to net iface start/stop") is changed do del_timer_sync() if queue is empty after pneigh_queue_purge() Cc: "David S. Miller" Cc: Eric Dumazet Cc: Jakub Kicinski Cc: Paolo Abeni Cc: Daniel Borkmann Cc: David Ahern Cc: Yajun Deng Cc: Roopa Prabhu Cc: Christian Brauner Cc: netdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: Denis V. Lunev Cc: Alexey Kuznetsov Cc: Konstantin Khorenko Cc: Pavel Tikhomirov Cc: Andrey Zhadchenko Cc: Alexander Mikhalitsyn Cc: kernel@openvz.org Cc: devel@openvz.org Signed-off-by: Denis V. Lunev Signed-off-by: Alexander Mikhalitsyn Alexander Mikhalitsyn (1): neighbour: make proxy_queue.qlen limit per-device Denis V. Lunev (1): neigh: fix possible DoS due to net iface start/stop loop include/net/neighbour.h | 1 + net/core/neighbour.c | 46 +++++++++++++++++++++++++++++++++-------- 2 files changed, 38 insertions(+), 9 deletions(-) -- 2.36.1