Received: by 2002:a6b:fb09:0:0:0:0:0 with SMTP id h9csp4675363iog; Wed, 22 Jun 2022 03:45:20 -0700 (PDT) X-Google-Smtp-Source: AGRyM1vANTXQccBAyuHDonkjPa0v2ypxsLZQqUEtYAQHjF3ZZajGdeUJeUs8AGPxxIE1uXmoT9Xd X-Received: by 2002:a17:90a:5207:b0:1eb:19e:e5ea with SMTP id v7-20020a17090a520700b001eb019ee5eamr33970292pjh.125.1655894719839; Wed, 22 Jun 2022 03:45:19 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1655894719; cv=none; d=google.com; s=arc-20160816; b=etKqetcTYPCCT91K4g9x4O/dcnK+ke6lj/es+FktEikAMsEE603v4kySuKpTv0T4iY PHKdi7jYyjEYdz/5JcE1yDmrriaHS7COwrGue0SvoWaKJ4Df9JV9DswoBO36Xdkvkz5G cxLYTaN6SJDHqDeuTFo/2E+JnkHVDk3b1tRxG3a9Ot/7/IIx3HicPKgcRXVZ9Abrb/Zb Rs0pLFKcwGP3UVS2bjQdh2FlAFnx8+m6As6VxDVKQbwP6wPK8Z3sQmAzNMr3mKGlFDz3 uz+8QZc8pnTen4wt7c3uW6/KzjrsIKkM6TjnBwbLl/UzQf0973SmaGYDfYzIFWTIoN9f 7yIw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=1glGpp0SEw3V0blGR4ljyCdLhDM0hardx9yidp3fQx8=; b=xDpjGNPwXWBCzFfU2/lCl16TRnEII5T1273XHj7kOPoxt6b2kGz9Ctbc/RcCwC6xmE G2RYT0YOlgHVRD7dkGICvtLFKqYgX0+fW6dHzddaerJjaWIbDRN3dx6mDgelRX2L/09n eW3qU6HDGKFh1rA9x4Meht6xoOfYcQZdJjGuqZQCRA7J/gvoB3FI9e8ECTtGt9X12XPS s+KUKQJFwVXJBF4ifwPQHhd5GEioZrg8q277Bu22M/RdZpDYgTB+RE/Bt3N6Aw3Oc32+ otUxJxKOlY6GwaT5sFdBDoro3PJuafzIYX/hnmhcCRiFY2cP2AeE6uwJQxP6n0EYUqvF psIw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=b8nMleUO; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id 123-20020a620581000000b005180cf5918csi20508217pff.327.2022.06.22.03.45.07; Wed, 22 Jun 2022 03:45:19 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=b8nMleUO; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1355022AbiFVKgi (ORCPT + 99 others); Wed, 22 Jun 2022 06:36:38 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54666 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1354935AbiFVKgg (ORCPT ); Wed, 22 Jun 2022 06:36:36 -0400 Received: from mail-yw1-x112f.google.com (mail-yw1-x112f.google.com [IPv6:2607:f8b0:4864:20::112f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DC8E73B575 for ; Wed, 22 Jun 2022 03:36:35 -0700 (PDT) Received: by mail-yw1-x112f.google.com with SMTP id 00721157ae682-317710edb9dso158487827b3.0 for ; Wed, 22 Jun 2022 03:36:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=1glGpp0SEw3V0blGR4ljyCdLhDM0hardx9yidp3fQx8=; b=b8nMleUO4TwkYSRjujyr/LLTBCRck31lksmnQJcNGC/AwwYGnL6D8o8Sp/G+xQPnIO 5LzJ21oKoQfvxa76OrCV3zrmuJJqPj2bmAJilYQXMfkLH4xArHcQUC83VgW/jND/x4uT OEzizq835lbeoVzbqsf4FHGXN2RKTFd71Vcx4LAmXb2VjdBzeYhKBvzwLF4GTcMNC/XX DRnFPQYVpdNH+nvSPmLiYXPXncNF28y74FdF0VrSvQB8jZXfHQFoSyMHLLFRuP5W/BQu qESfWuYTFiKdeMwRl46qvUNk7gYSWvjpnlqHGNJpUeLeqYyXaWtaqSG0f3WYA60h2ic9 CDbA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=1glGpp0SEw3V0blGR4ljyCdLhDM0hardx9yidp3fQx8=; b=ynt52zmdEJpz4Cv+X3SlUfWuLb+1g5czCogWrMdUGOD9oTjiu5uzk+emp9gTQIjZ2B TSipQI4XGkr4bHQ4zaqXky6BhBZxMxR1L8eC0B9x5c1CZNQdje7vM+KAhFbSb7pWYF61 m0XX7npZv18oaEseGvRtUtLFH8bDJkzTCaO18SQHC8v/QcUMAdu9P3Oz5xNjPGZTxtvZ iqg2dB+6B/2r0tm3CZqVRxpQQTnsqpB/92cf3ao0OYGQumub3yLkvwVlbP6cvTjWaKAM APcrW5k1v+aRKIlp0Xjitq5t1uhiLuxZa7ZSXXs3kfwHoCxQKjc3jlmeN59VqwXSAKzT DCkg== X-Gm-Message-State: AJIora+pXdNPg/WIQ5ABkUhAS65KSGuRQgEdyafKsq2WrYTLz1Vh5DFt xUH9Ta6e12Gi+CkAnYirUrpLfJvr1L1hzAoTz+bNnw== X-Received: by 2002:a81:e93:0:b0:317:8db7:aa8e with SMTP id 141-20020a810e93000000b003178db7aa8emr3284423ywo.55.1655894194871; Wed, 22 Jun 2022 03:36:34 -0700 (PDT) MIME-Version: 1.0 References: <20220619003919.394622-1-i.maximets@ovn.org> <20220622102813.GA24844@breakpoint.cc> In-Reply-To: <20220622102813.GA24844@breakpoint.cc> From: Eric Dumazet Date: Wed, 22 Jun 2022 12:36:23 +0200 Message-ID: Subject: Re: [PATCH net] net: ensure all external references are released in deferred skbuffs To: Florian Westphal Cc: Ilya Maximets , netdev , "David S. Miller" , dev@openvswitch.org, LKML , Jakub Kicinski , Paolo Abeni Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jun 22, 2022 at 12:28 PM Florian Westphal wrote: > > Eric Dumazet wrote: > > On Sun, Jun 19, 2022 at 2:39 AM Ilya Maximets wrote: > > > > > > Open vSwitch system test suite is broken due to inability to > > > load/unload netfilter modules. kworker thread is getting trapped > > > in the infinite loop while running a net cleanup inside the > > > nf_conntrack_cleanup_net_list, because deferred skbuffs are still > > > holding nfct references and not being freed by their CPU cores. > > > > > > In general, the idea that we will have an rx interrupt on every > > > CPU core at some point in a near future doesn't seem correct. > > > Devices are getting created and destroyed, interrupts are getting > > > re-scheduled, CPUs are going online and offline dynamically. > > > Any of these events may leave packets stuck in defer list for a > > > long time. It might be OK, if they are just a piece of memory, > > > but we can't afford them holding references to any other resources. > > > > > > In case of OVS, nfct reference keeps the kernel thread in busy loop > > > while holding a 'pernet_ops_rwsem' semaphore. That blocks the > > > later modprobe request from user space: > > > > > > # ps > > > 299 root R 99.3 200:25.89 kworker/u96:4+ > > > > > > # journalctl > > > INFO: task modprobe:11787 blocked for more than 1228 seconds. > > > Not tainted 5.19.0-rc2 #8 > > > task:modprobe state:D > > > Call Trace: > > > > > > __schedule+0x8aa/0x21d0 > > > schedule+0xcc/0x200 > > > rwsem_down_write_slowpath+0x8e4/0x1580 > > > down_write+0xfc/0x140 > > > register_pernet_subsys+0x15/0x40 > > > nf_nat_init+0xb6/0x1000 [nf_nat] > > > do_one_initcall+0xbb/0x410 > > > do_init_module+0x1b4/0x640 > > > load_module+0x4c1b/0x58d0 > > > __do_sys_init_module+0x1d7/0x220 > > > do_syscall_64+0x3a/0x80 > > > entry_SYSCALL_64_after_hwframe+0x46/0xb0 > > > > > > At this point OVS testsuite is unresponsive and never recover, > > > because these skbuffs are never freed. > > > > > > Solution is to make sure no external references attached to skb > > > before pushing it to the defer list. Using skb_release_head_state() > > > for that purpose. The function modified to be re-enterable, as it > > > will be called again during the defer list flush. > > > > > > Another approach that can fix the OVS use-case, is to kick all > > > cores while waiting for references to be released during the net > > > cleanup. But that sounds more like a workaround for a current > > > issue rather than a proper solution and will not cover possible > > > issues in other parts of the code. > > > > > > Additionally checking for skb_zcopy() while deferring. This might > > > not be necessary, as I'm not sure if we can actually have zero copy > > > packets on this path, but seems worth having for completeness as we > > > should never defer such packets regardless. > > > > > > CC: Eric Dumazet > > > Fixes: 68822bdf76f1 ("net: generalize skb freeing deferral to per-cpu lists") > > > Signed-off-by: Ilya Maximets > > > --- > > > net/core/skbuff.c | 16 +++++++++++----- > > > 1 file changed, 11 insertions(+), 5 deletions(-) > > > > I do not think this patch is doing the right thing. > > > > Packets sitting in TCP receive queues should not hold state that is > > not relevant for TCP recvmsg(). > > Agree, but tcp_v4/6_rcv() already call nf_reset_ct(), else it would > not be possible to remove nf_conntrack module in practice. Well, existing nf_reset_ct() does not catch all cases, like TCP fastopen ? Maybe 68822bdf76f1 ("net: generalize skb freeing deferral to per-cpu lists") only widened the problem. > > I wonder where the deferred skbs are coming from, any and all > queued skbs need the conntrack state dropped. > > I don't mind a new helper that does a combined dst+ct release though.