Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp906441rwd; Tue, 13 Jun 2023 01:54:32 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ7VP9u0b3CFeLZz7EBOASGxpBbzJRhW2j0paFEo+WiNsTJ0pvWBN7MvCQ69pQs2okbd+q+/ X-Received: by 2002:a17:907:7f14:b0:974:571f:8d0f with SMTP id qf20-20020a1709077f1400b00974571f8d0fmr12010309ejc.60.1686646471697; Tue, 13 Jun 2023 01:54:31 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1686646471; cv=none; d=google.com; s=arc-20160816; b=DPI7zdPnEwAY4Gw89gOh4WEIJDJ8ru0hClFpCkdV2BTB5CNAJXYcOVrPEZrecdpaim wPz0SjpOCfeqTJe33poUwwhIY5AiqWfoBNlUv2b8f+wIBQG/robiEjVW087mUEF1r1Ml RKkC03dFELdnLiljfovk5KUbdxz/AzcUr673uVSmFrMKMD8lyhhmq1aNJaJnV1G39eRJ /9HBsNVNSnjlD0nt56fWjC0x8qXsLpbB0ch96hBAEqk65/h/8cUCtpEZ/x4oGARRkn9M Fm+x5PysJzsuvjqYsatA20U233Hz4cgqxiTlMq1u12zM9qio3QwNqsn38aN2ojUGSZBf XIAw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:user-agent :content-transfer-encoding:references:in-reply-to:date:cc:to:from :subject:message-id:dkim-signature; bh=vrGYNaIheG5cW9vw0WAFpAIgS9tsi2HOTi0exhZ2A/c=; b=kispll7ll0+YBenAh7MtxQ+AXgSNwcAgQ4kgp1NiwdSMA0Tw2SdJoXHgV5LBkNx0Xs HHJZVU2T9ZBwU/mU/uZlWGDW8mhGCBOZsMgpkKD4DemcS+1nkU8HIgZYFnRGBg5tNZJd gJUTNPnvlG9VFlo4EzJ7P7riJ8MjYqLm97PhG3ByNqft9OKumbTJMLyoh6EgO2EoG2EA nS/FWRhuPQd5Lp4rsmsIL7UqEROnGrqxTQgYvPYRKYS5SfCVkhaPLMr210tCYN+e/jKE t2wiP1cAxjWHqCQFhxUW0OFU+30+Lps/kUu2rFG4I7ygze+aa6Y/7zTYmA5Cx0pePnsP VN2A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=faI69AB7; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id l17-20020a1709066b9100b0098237f8a78dsi854299ejr.942.2023.06.13.01.54.06; Tue, 13 Jun 2023 01:54:31 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=faI69AB7; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241751AbjFMIlK (ORCPT + 99 others); Tue, 13 Jun 2023 04:41:10 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48712 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235640AbjFMIkb (ORCPT ); Tue, 13 Jun 2023 04:40:31 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5AFC11998 for ; Tue, 13 Jun 2023 01:39:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1686645581; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=vrGYNaIheG5cW9vw0WAFpAIgS9tsi2HOTi0exhZ2A/c=; b=faI69AB7Z74YUFcNPuxJw61EISqqTpX+oVHbkWuoU0bLazWCBHDBt8WXyX7S2VTfrHcAbe rNCP8HC0Lsmn6uiTyaZ6C4IOTSGgCTSlJwUVH1b4Ip4acO04OazW/XhqE+HwERnaDOIznE 7kMKTQ0q/lu9SQM7v3meiX75Lq/9GSk= Received: from mail-qv1-f70.google.com (mail-qv1-f70.google.com [209.85.219.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-615-YiZQAdfiO3quRZa3YQgfdA-1; Tue, 13 Jun 2023 04:39:39 -0400 X-MC-Unique: YiZQAdfiO3quRZa3YQgfdA-1 Received: by mail-qv1-f70.google.com with SMTP id 6a1803df08f44-62849c5e9f0so6816016d6.1 for ; Tue, 13 Jun 2023 01:39:39 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686645579; x=1689237579; h=mime-version:user-agent:content-transfer-encoding:references :in-reply-to:date:cc:to:from:subject:message-id:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=vrGYNaIheG5cW9vw0WAFpAIgS9tsi2HOTi0exhZ2A/c=; b=dQRsLu3ULXj+Yobd4RvhZ9BPoK8nybpkDo+nAhFRv88+5xyUfJQSwuxcTSsFMy+BD6 clSBeaJhT4a+tpjxq6jnAw9B0TW6wfU9q1jSypPCpUvDhMIT5JFHWClaSdrtzCpqdF9R ZPazlbSjLjdNxkTipPFSvXGaZFhnoAet3Rrs5/BEcJ6/+8fbUrUa0aRnUHWurPNpIiRx 2VrjxvOPJnU5HQc7CENHR778j55B/linnZA775NXP1gUng2CgyQpp0lpxfn21nJc3MdK gzSbeYzvCTuUnIvYL3dG2/1Tv/LSYmdynaXDr4lu1AOXJr8NpPJmkWit6cCXYGuN0WOz wHbw== X-Gm-Message-State: AC+VfDzBO9qmpxiqS6LlaJUqhMlLuxwO1qLJjZOOULkxLlUiBPLf4AIU 6Izh3rV8C4jBT/9AxbrOfApzkM2SwCauxmHJjPlquR97FR8oeGBEv4aBUkBXLQSJ/pLD/OH37Sj CtZlUW07Ju70Wsr5qFH3+XIl8 X-Received: by 2002:a05:6214:763:b0:62d:eceb:f7ce with SMTP id f3-20020a056214076300b0062decebf7cemr5372272qvz.1.1686645578865; Tue, 13 Jun 2023 01:39:38 -0700 (PDT) X-Received: by 2002:a05:6214:763:b0:62d:eceb:f7ce with SMTP id f3-20020a056214076300b0062decebf7cemr5372260qvz.1.1686645578567; Tue, 13 Jun 2023 01:39:38 -0700 (PDT) Received: from gerbillo.redhat.com (146-241-245-147.dyn.eolo.it. [146.241.245.147]) by smtp.gmail.com with ESMTPSA id e21-20020a0caa55000000b00626330a39ecsm3835317qvb.9.2023.06.13.01.39.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 13 Jun 2023 01:39:38 -0700 (PDT) Message-ID: <7f773c114001cbcd0c6ff21da9976eb0ba533421.camel@redhat.com> Subject: Re: [PATCH net 2/2] net/sched: qdisc_destroy() old ingress and clsact Qdiscs before grafting From: Paolo Abeni To: Peilin Ye , Jamal Hadi Salim , Cong Wang , Jiri Pirko , "David S. Miller" , Eric Dumazet , Jakub Kicinski Cc: Peilin Ye , Vlad Buslov , Pedro Tammela , John Fastabend , Daniel Borkmann , Hillf Danton , Zhengchao Shao , netdev@vger.kernel.org, linux-kernel@vger.kernel.org, Cong Wang Date: Tue, 13 Jun 2023 10:39:33 +0200 In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable User-Agent: Evolution 3.46.4 (3.46.4-1.fc37) MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, On Sat, 2023-06-10 at 20:30 -0700, Peilin Ye wrote: > From: Peilin Ye >=20 > mini_Qdisc_pair::p_miniq is a double pointer to mini_Qdisc, initialized > in ingress_init() to point to net_device::miniq_ingress. ingress Qdiscs > access this per-net_device pointer in mini_qdisc_pair_swap(). Similar > for clsact Qdiscs and miniq_egress. >=20 > Unfortunately, after introducing RTNL-unlocked RTM_{NEW,DEL,GET}TFILTER > requests (thanks Hillf Danton for the hint), when replacing ingress or > clsact Qdiscs, for example, the old Qdisc ("@old") could access the same > miniq_{in,e}gress pointer(s) concurrently with the new Qdisc ("@new"), > causing race conditions [1] including a use-after-free bug in > mini_qdisc_pair_swap() reported by syzbot: >=20 > BUG: KASAN: slab-use-after-free in mini_qdisc_pair_swap+0x1c2/0x1f0 net/= sched/sch_generic.c:1573 > Write of size 8 at addr ffff888045b31308 by task syz-executor690/14901 > ... > Call Trace: > > __dump_stack lib/dump_stack.c:88 [inline] > dump_stack_lvl+0xd9/0x150 lib/dump_stack.c:106 > print_address_description.constprop.0+0x2c/0x3c0 mm/kasan/report.c:319 > print_report mm/kasan/report.c:430 [inline] > kasan_report+0x11c/0x130 mm/kasan/report.c:536 > mini_qdisc_pair_swap+0x1c2/0x1f0 net/sched/sch_generic.c:1573 > tcf_chain_head_change_item net/sched/cls_api.c:495 [inline] > tcf_chain0_head_change.isra.0+0xb9/0x120 net/sched/cls_api.c:509 > tcf_chain_tp_insert net/sched/cls_api.c:1826 [inline] > tcf_chain_tp_insert_unique net/sched/cls_api.c:1875 [inline] > tc_new_tfilter+0x1de6/0x2290 net/sched/cls_api.c:2266 > ... >=20 > @old and @new should not affect each other. In other words, @old should > never modify miniq_{in,e}gress after @new, and @new should not update > @old's RCU state. >=20 > Fixing without changing sch_api.c turned out to be difficult (please > refer to Closes: for discussions). Instead, make sure @new's first call > always happen after @old's last call (in {ingress,clsact}_destroy()) has > finished: >=20 > In qdisc_graft(), return -EBUSY if @old has any ongoing filter requests, > and call qdisc_destroy() for @old before grafting @new. >=20 > Introduce qdisc_refcount_dec_if_one() as the counterpart of > qdisc_refcount_inc_nz() used for filter requests. Introduce a > non-static version of qdisc_destroy() that does a TCQ_F_BUILTIN check, > just like qdisc_put() etc. >=20 > Depends on patch "net/sched: Refactor qdisc_graft() for ingress and > clsact Qdiscs". >=20 > [1] To illustrate, the syzkaller reproducer adds ingress Qdiscs under > TC_H_ROOT (no longer possible after commit c7cfbd115001 ("net/sched: > sch_ingress: Only create under TC_H_INGRESS")) on eth0 that has 8 > transmission queues: >=20 > Thread 1 creates ingress Qdisc A (containing mini Qdisc a1 and a2), > then adds a flower filter X to A. >=20 > Thread 2 creates another ingress Qdisc B (containing mini Qdisc b1 and > b2) to replace A, then adds a flower filter Y to B. >=20 > Thread 1 A's refcnt Thread 2 > RTM_NEWQDISC (A, RTNL-locked) > qdisc_create(A) 1 > qdisc_graft(A) 9 >=20 > RTM_NEWTFILTER (X, RTNL-unlocked) > __tcf_qdisc_find(A) 10 > tcf_chain0_head_change(A) > mini_qdisc_pair_swap(A) (1st) > | > | RTM_NEWQDISC (B, RTNL-locked) > RCU sync 2 qdisc_graft(B) > | 1 notify_and_destroy(A) > | > tcf_block_release(A) 0 RTM_NEWTFILTER (Y, RTNL-unlocked) > qdisc_destroy(A) tcf_chain0_head_change(B) > tcf_chain0_head_change_cb_del(A) mini_qdisc_pair_swap(B) (2nd) > mini_qdisc_pair_swap(A) (3rd) | > ... ... >=20 > Here, B calls mini_qdisc_pair_swap(), pointing eth0->miniq_ingress to > its mini Qdisc, b1. Then, A calls mini_qdisc_pair_swap() again during > ingress_destroy(), setting eth0->miniq_ingress to NULL, so ingress > packets on eth0 will not find filter Y in sch_handle_ingress(). >=20 > This is just one of the possible consequences of concurrently accessing > miniq_{in,e}gress pointers. >=20 > Fixes: 7a096d579e8e ("net: sched: ingress: set 'unlocked' flag for Qdisc = ops") > Fixes: 87f373921c4e ("net: sched: ingress: set 'unlocked' flag for clsact= Qdisc ops") > Reported-by: syzbot+b53a9c0d1ea4ad62da8b@syzkaller.appspotmail.com > Closes: https://lore.kernel.org/r/0000000000006cf87705f79acf1a@google.com= / > Cc: Hillf Danton > Cc: Vlad Buslov > Signed-off-by: Peilin Ye The fixes LGTM, but I guess this could deserve an explicit ack from Jakub, as he raised to point for the full retry implementation. Cheers, Paolo