Received: by 2002:a05:6358:9144:b0:117:f937:c515 with SMTP id r4csp5111295rwr; Mon, 8 May 2023 18:37:10 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ56BUjnYyScy8rvTUwDTpcE0Xu6X9Kp1J2TgKbOQfmEGiXpFSHSkFWnOdTMFD3eN4n7q1Td X-Received: by 2002:a05:6a20:6a15:b0:f3:57b2:79e2 with SMTP id p21-20020a056a206a1500b000f357b279e2mr16344530pzk.12.1683596229956; Mon, 08 May 2023 18:37:09 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1683596229; cv=none; d=google.com; s=arc-20160816; b=pHtKqJeNZ+dRulKcL9nMobt8VuVhxTCS/loTTDRAMaE5Sal8l3IQ6bxl3z0Gjdxeyq I42KmHGF0+MAOco6JHBJj9Mr12u0KlZfBIUyJjK8RA+4VD4AhNP+IgLMP1aTEyBnsnwm fgr29UCnm0t0fYWj6JRIS8cTK6zHA5UhV5CBWwVh8rs42UGrqy4q6/sluMKzgEII1QfD 7r4cZlkYsb5WeSvFMc9Mj2SJjwuz3pxefklKUfSa3lmHhBB93O2oKWim9+GdTS7qPvdo FvuTap5xUVJSO8DqeZcqjQSn3b7xRfdkVMiWl+RSMBGlaKmFnSdBR3BURRS7RGYHcoO0 cmug== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:subject:cc:to:from:date :dkim-signature; bh=ipBpTErB0ZuPOHrET9HboUbzRolAs2vRyXXMkjk4Rgw=; b=zTvtWX3vIQC6vGUxno54pRtBTgdyYVw4X4i3Bp1catn4o/gJV3+iq+6yCMXSzAfiY8 P55kxp61KAnp5CCk5JLeYRfxn+U4Vodlk+uuEMVBAjPqhszL6xvf/Yj5BdtWXO3co7LA RBtCQCgzB5zarWqqzDkGwFXeYMEoQ6uwL6egXOabpRALAIcsfy5Z3oniDLyAkXey1gCR 9IfkHQhtZL24DIdGY8qqn1DDfYy0i4LkYP8M7McwTYfc10TnbZv1XSV8xsG0gXtV/fk1 3spoStjvKRZjXJ1asO/uubRlEzH0sDEjgCtzEwCEJez7iOyw9wRVqfH7FZ6giNTAae0J FsBw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=iFUxhCiL; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id k7-20020a633d07000000b00528cf858c87si355075pga.156.2023.05.08.18.36.57; Mon, 08 May 2023 18:37:09 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=iFUxhCiL; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232838AbjEIBd3 (ORCPT + 99 others); Mon, 8 May 2023 21:33:29 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35230 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229539AbjEIBd2 (ORCPT ); Mon, 8 May 2023 21:33:28 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 74E064494 for ; Mon, 8 May 2023 18:33:27 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 0BC6362CE4 for ; Tue, 9 May 2023 01:33:27 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id BA099C433D2; Tue, 9 May 2023 01:33:25 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1683596006; bh=Ezq1uBXtkHMJMAu0MHKbIyYUPpeNC5x1Kp/DxAAAx0I=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=iFUxhCiLyuEiZWBD6LV+RXw5/W/DF5lUrqtC/oT+tR2EdYDBYRowcoj3KDfrKFx4C 4DhGyLNoCfVmeSZcTdx/skad/uJGoNw986S7GNGwxJ/Mfi7seMDuD0B/L3VKrBukLW 8jeMwAmhBTob+TtXg1buB2TCMBZIMOg/2OdijfO1Fg+uxLL5/ezTjx9tEpTn4GcPOJ Fd+vIuXzxPHaiHtFh2HjQI7Ta5/9UQG5hWCjiZTErQ+ej4KSx21j9AK4StwEHshDR4 rqDHGY7ue2qMPQH1znVDhD+XfF5w4/FcIK2ZIXz8e8WoWcLSvv+V2v7cmZQZLWc8co 5RQ30Hz2o+A4w== Date: Mon, 8 May 2023 18:33:24 -0700 From: Jakub Kicinski To: Peilin Ye Cc: "David S. Miller" , Eric Dumazet , Paolo Abeni , Jamal Hadi Salim , Cong Wang , Jiri Pirko , Peilin Ye , Daniel Borkmann , John Fastabend , Vlad Buslov , Pedro Tammela , Hillf Danton , netdev@vger.kernel.org, linux-kernel@vger.kernel.org, Cong Wang Subject: Re: [PATCH net 6/6] net/sched: qdisc_destroy() old ingress and clsact Qdiscs before grafting Message-ID: <20230508183324.020f3ec7@kernel.org> In-Reply-To: References: MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 5 May 2023 17:16:10 -0700 Peilin Ye wrote: > Thread 1 creates ingress Qdisc A (containing mini Qdisc a1 and a2), then > adds a flower filter X to A. > > Thread 2 creates another ingress Qdisc B (containing mini Qdisc b1 and > b2) to replace A, then adds a flower filter Y to B. > > Thread 1 A's refcnt Thread 2 > RTM_NEWQDISC (A, RTNL-locked) > qdisc_create(A) 1 > qdisc_graft(A) 9 > > RTM_NEWTFILTER (X, RTNL-lockless) > __tcf_qdisc_find(A) 10 > tcf_chain0_head_change(A) > mini_qdisc_pair_swap(A) (1st) > | > | RTM_NEWQDISC (B, RTNL-locked) > RCU 2 qdisc_graft(B) > | 1 notify_and_destroy(A) > | > tcf_block_release(A) 0 RTM_NEWTFILTER (Y, RTNL-lockless) > qdisc_destroy(A) tcf_chain0_head_change(B) > tcf_chain0_head_change_cb_del(A) mini_qdisc_pair_swap(B) (2nd) > mini_qdisc_pair_swap(A) (3rd) | > ... ... > > Here, B calls mini_qdisc_pair_swap(), pointing eth0->miniq_ingress to its > mini Qdisc, b1. Then, A calls mini_qdisc_pair_swap() again during > ingress_destroy(), setting eth0->miniq_ingress to NULL, so ingress packets > on eth0 will not find filter Y in sch_handle_ingress(). > > This is just one of the possible consequences of concurrently accessing > net_device::miniq_{in,e}gress pointers. The point is clear, however: > B's first call to mini_qdisc_pair_swap() should take place after A's > last call, in qdisc_destroy(). Great analysis, thanks for squashing this bug. Have you considered creating a fix more localized to the miniq implementation? It seems that having per-device miniq pointers is incompatible with using reference counted objects. So miniq is a more natural place to solve the problem. Otherwise workarounds in the core keep piling up (here qdisc_graft()). Can we replace the rcu_assign_pointer in (3rd) with a cmpxchg()? If active qdisc is neither a1 nor a2 we should leave the dev state alone.