Received: by 2002:a6b:fb09:0:0:0:0:0 with SMTP id h9csp4675013iog; Wed, 22 Jun 2022 03:44:44 -0700 (PDT) X-Google-Smtp-Source: AGRyM1t2JTIp+LanZrNoxG3aNTjh07HRJk8Li2wqiMlH+mS8LcOqX0VqmRz9RPcvyW3q41utRsi+ X-Received: by 2002:a05:6a00:1152:b0:4be:ab79:fcfa with SMTP id b18-20020a056a00115200b004beab79fcfamr34639757pfm.3.1655894683861; Wed, 22 Jun 2022 03:44:43 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1655894683; cv=none; d=google.com; s=arc-20160816; b=MAzCUGvAKEyWShf+flTkHR5mH3pvtqKeZ33/7yF1Bjf8yYfJsFBe54PVIFxGpbO0WC jS4DFvFjSTDcP6st0iqMdcx8o5COk6ERvNv9VZ2j3kyQ9h8HxUfL5RVsW4OnMMZGuB7i EcwD+xIHAg34QOdcFHz8BT53rOdGUF6MrH0XmXpw5DOLqkIlxseHbChOtLRNpmGRKDUH uf690htOosZeBm0czBHypx70P1dC/rUlf/8seun1TmVIRMDG8XqxVZNLbmlA04qgtfwO wOwACc70ee86piuvzTMkNAB4lVwHz8t83RABhVIadQa9yCeRvOJM5niTmR9O2f5cJaHX aj9Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=O5W30VtSJ4JctEgJNd0/A0NgRUkrMmjNRM+ZA3ZAv8E=; b=YXLGICEYNcAXRNpSfgdaCc+Jl5bmyDFPFg/y4p/Mv6r+f5yoRhFfYcA0ubLvsyv6O2 BsrQpjhZSXxp2heSOaHLX/UxC4Oeom2EIYO4fnxBlRU6ZmV1hcMebn38/A1BMzAcTYUU lCNlkweOj540k6SQfain5MFchGCd4xbCgWpLTFdJwKNTI1hPfiPWCnnY2qqG8dpQzACH Rc8Jdy/8Pi8fWAm4T6wR0GWOCf2+2PRTuoWFUsZY/M3w3hdlsJ7+q8/Bc/P8RDimFZiR iFcns/7ZoYyHCQPqeUzlVjPPbmJefW2Gbydb0XEO7HgrFYCpoc7CSQlun+Yrgn9plah2 33kw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id 18-20020a17090a035200b001ecbd9a7fadsi7404506pjf.56.2022.06.22.03.44.29; Wed, 22 Jun 2022 03:44:43 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1346969AbiFVKlZ (ORCPT + 99 others); Wed, 22 Jun 2022 06:41:25 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48654 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1353850AbiFVK2Z (ORCPT ); Wed, 22 Jun 2022 06:28:25 -0400 Received: from Chamillionaire.breakpoint.cc (Chamillionaire.breakpoint.cc [IPv6:2a0a:51c0:0:12e:520::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 456CEE4B; Wed, 22 Jun 2022 03:28:23 -0700 (PDT) Received: from fw by Chamillionaire.breakpoint.cc with local (Exim 4.92) (envelope-from ) id 1o3xan-0002Cj-GJ; Wed, 22 Jun 2022 12:28:13 +0200 Date: Wed, 22 Jun 2022 12:28:13 +0200 From: Florian Westphal To: Eric Dumazet Cc: Ilya Maximets , netdev , "David S. Miller" , dev@openvswitch.org, LKML , Jakub Kicinski , Paolo Abeni , Florian Westphal Subject: Re: [PATCH net] net: ensure all external references are released in deferred skbuffs Message-ID: <20220622102813.GA24844@breakpoint.cc> References: <20220619003919.394622-1-i.maximets@ovn.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_MED, SPF_HELO_PASS,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Eric Dumazet wrote: > On Sun, Jun 19, 2022 at 2:39 AM Ilya Maximets wrote: > > > > Open vSwitch system test suite is broken due to inability to > > load/unload netfilter modules. kworker thread is getting trapped > > in the infinite loop while running a net cleanup inside the > > nf_conntrack_cleanup_net_list, because deferred skbuffs are still > > holding nfct references and not being freed by their CPU cores. > > > > In general, the idea that we will have an rx interrupt on every > > CPU core at some point in a near future doesn't seem correct. > > Devices are getting created and destroyed, interrupts are getting > > re-scheduled, CPUs are going online and offline dynamically. > > Any of these events may leave packets stuck in defer list for a > > long time. It might be OK, if they are just a piece of memory, > > but we can't afford them holding references to any other resources. > > > > In case of OVS, nfct reference keeps the kernel thread in busy loop > > while holding a 'pernet_ops_rwsem' semaphore. That blocks the > > later modprobe request from user space: > > > > # ps > > 299 root R 99.3 200:25.89 kworker/u96:4+ > > > > # journalctl > > INFO: task modprobe:11787 blocked for more than 1228 seconds. > > Not tainted 5.19.0-rc2 #8 > > task:modprobe state:D > > Call Trace: > > > > __schedule+0x8aa/0x21d0 > > schedule+0xcc/0x200 > > rwsem_down_write_slowpath+0x8e4/0x1580 > > down_write+0xfc/0x140 > > register_pernet_subsys+0x15/0x40 > > nf_nat_init+0xb6/0x1000 [nf_nat] > > do_one_initcall+0xbb/0x410 > > do_init_module+0x1b4/0x640 > > load_module+0x4c1b/0x58d0 > > __do_sys_init_module+0x1d7/0x220 > > do_syscall_64+0x3a/0x80 > > entry_SYSCALL_64_after_hwframe+0x46/0xb0 > > > > At this point OVS testsuite is unresponsive and never recover, > > because these skbuffs are never freed. > > > > Solution is to make sure no external references attached to skb > > before pushing it to the defer list. Using skb_release_head_state() > > for that purpose. The function modified to be re-enterable, as it > > will be called again during the defer list flush. > > > > Another approach that can fix the OVS use-case, is to kick all > > cores while waiting for references to be released during the net > > cleanup. But that sounds more like a workaround for a current > > issue rather than a proper solution and will not cover possible > > issues in other parts of the code. > > > > Additionally checking for skb_zcopy() while deferring. This might > > not be necessary, as I'm not sure if we can actually have zero copy > > packets on this path, but seems worth having for completeness as we > > should never defer such packets regardless. > > > > CC: Eric Dumazet > > Fixes: 68822bdf76f1 ("net: generalize skb freeing deferral to per-cpu lists") > > Signed-off-by: Ilya Maximets > > --- > > net/core/skbuff.c | 16 +++++++++++----- > > 1 file changed, 11 insertions(+), 5 deletions(-) > > I do not think this patch is doing the right thing. > > Packets sitting in TCP receive queues should not hold state that is > not relevant for TCP recvmsg(). Agree, but tcp_v4/6_rcv() already call nf_reset_ct(), else it would not be possible to remove nf_conntrack module in practice. I wonder where the deferred skbs are coming from, any and all queued skbs need the conntrack state dropped. I don't mind a new helper that does a combined dst+ct release though.