Received: by 2002:a05:7412:e794:b0:fa:551:50a7 with SMTP id o20csp3001709rdd; Sat, 13 Jan 2024 10:31:38 -0800 (PST) X-Google-Smtp-Source: AGHT+IGCqHJ1szXobh0u2r1N59P2RO8xqjIRjHn5rcxpwcs72FW1uHa+vjoUIjUULBzBkGISLxcz X-Received: by 2002:a05:6a20:2926:b0:199:8d8a:a746 with SMTP id t38-20020a056a20292600b001998d8aa746mr3016239pzf.29.1705170698115; Sat, 13 Jan 2024 10:31:38 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1705170698; cv=none; d=google.com; s=arc-20160816; b=YAvwahcSYPfakoSYyboksijf8PK9iBwHEB8AWw3cwu2UNRhobk2gvTD13H3l5s5gqL q2iMjzPdpFygcspx3zyvRaWRrhCdImwSqGGFQNuFpriESimqoM+HwV0RcXuqX3Ro3D7F bSqv1lnQKerf0VuuPoNCE8toPsOqB9ifB+ZB71fqhzL9qVveYUmGd4I400N2fnIouVGO RXe4dKACwlONyFAk2s+SiaIDuftLntT9dXm1Iyw13sN9J4MZH+EWATclLwrQcIie/ZPI qH0VLADaQSp0a72See5s+okfEkgiH5vaXtrrRxI7rqMkUCpkjOjsPBWScGuAHwFKP1GA 0C4g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=mime-version:list-unsubscribe:list-subscribe:list-id:precedence :references:message-id:in-reply-to:subject:cc:to:from:date :dkim-signature; bh=fU3zSMxyPs5N6FWRf4OU+oIXIA/uCyN8WF6CVbHS8Sc=; fh=AZxIfg1WlTH9HdiPA8aYRDPmgkmukSS2Q+lCYdveQ/Q=; b=CqXOMQcjMx9Iz+vhR7eJUDSPPD7/zqILIZCbnFUuQ8cKwNuXEYlKv/PSNv/vlSVb/y BF0TROENahy5pSvK81zkc7x3ODeUjiTf8lASluqcklGa+E/DXvMa4Ja5a681yJkKzp8d xmIqKR+5OGlSozFGAEY7toAASyOVzxsbJIfEHJIiclsuNjn2bIXJyfeDGNklwGJiL3SY Hg/NbY1bvUJ38j5AhTreBDmM65HhSAVfC5J/+p6Dv+iw1Oc8QBO34kBYVa7IIQ6eK0/v 3UcdY4vTpvjBcUDSYAjm8Su2BmxUXp0wM9bM3qtirq9PaHxuUbOfDSYcENOaZvuDLhd/ nVtA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@blackhole.kfki.hu header.s=20151130 header.b=lVUHu3iz; spf=pass (google.com: domain of linux-kernel+bounces-25349-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-25349-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=blackhole.kfki.hu Return-Path: Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org. [2604:1380:40f1:3f00::1]) by mx.google.com with ESMTPS id fi5-20020a056a00398500b006da13c07b78si5810433pfb.122.2024.01.13.10.31.37 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 13 Jan 2024 10:31:38 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-25349-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) client-ip=2604:1380:40f1:3f00::1; Authentication-Results: mx.google.com; dkim=pass header.i=@blackhole.kfki.hu header.s=20151130 header.b=lVUHu3iz; spf=pass (google.com: domain of linux-kernel+bounces-25349-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-25349-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=blackhole.kfki.hu Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sy.mirrors.kernel.org (Postfix) with ESMTPS id EAA0DB22051 for ; Sat, 13 Jan 2024 18:31:36 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id E5D09568A; Sat, 13 Jan 2024 18:31:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=blackhole.kfki.hu header.i=@blackhole.kfki.hu header.b="lVUHu3iz" Received: from smtp0-kfki.kfki.hu (smtp0-kfki.kfki.hu [148.6.0.49]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E3520610C; Sat, 13 Jan 2024 18:31:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=blackhole.kfki.hu Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=blackhole.kfki.hu Received: from localhost (localhost [127.0.0.1]) by smtp0.kfki.hu (Postfix) with ESMTP id CD147674010C; Sat, 13 Jan 2024 19:24:12 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= blackhole.kfki.hu; h=mime-version:references:message-id :in-reply-to:from:from:date:date:received:received:received :received; s=20151130; t=1705170250; x=1706984651; bh=fU3zSMxyPs 5N6FWRf4OU+oIXIA/uCyN8WF6CVbHS8Sc=; b=lVUHu3izXK68uXjlxaRhIqFrI8 rgeVXmjV3QczwxIgjoqv3Kbb47Yhu2nAsMigMIVoNe9YTxwrZ7UyAxW/4yP2uZFH DMz+q5ppWf1o57t4VxPZUfVTtT1wTssXyAUBxIV0zWa+yOg//I+fnnKuB0G8Eh81 3acxztAzhDegHBv4k= X-Virus-Scanned: Debian amavisd-new at smtp0.kfki.hu Received: from smtp0.kfki.hu ([127.0.0.1]) by localhost (smtp0.kfki.hu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP; Sat, 13 Jan 2024 19:24:10 +0100 (CET) Received: from mentat.rmki.kfki.hu (host-94-248-211-167.kabelnet.hu [94.248.211.167]) (Authenticated sender: kadlecsik.jozsef@wigner.hu) by smtp0.kfki.hu (Postfix) with ESMTPSA id 7F5376740101; Sat, 13 Jan 2024 19:24:09 +0100 (CET) Received: by mentat.rmki.kfki.hu (Postfix, from userid 1000) id C08D175A; Sat, 13 Jan 2024 19:24:07 +0100 (CET) Received: from localhost (localhost [127.0.0.1]) by mentat.rmki.kfki.hu (Postfix) with ESMTP id BBB68932; Sat, 13 Jan 2024 19:24:07 +0100 (CET) Date: Sat, 13 Jan 2024 19:24:07 +0100 (CET) From: Jozsef Kadlecsik To: David Wang <00107082@163.com> cc: ale.crismani@automattic.com, linux-kernel@vger.kernel.org, netfilter-devel@vger.kernel.org, Pablo Neira Ayuso , xiaolinkui@kylinos.cn Subject: Re:Performance regression in ip_set_swap on 6.7.0 In-Reply-To: <20240111145330.18474-1-00107082@163.com> Message-ID: References: <20240111145330.18474-1-00107082@163.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-deepspam: dunno 30% On Thu, 11 Jan 2024, David Wang wrote: > I tested the patch with code stressing swap->destroy->create->add 10000 > times, the performance regression still happens, and now it is > ip_set_destroy. (I pasted the test code at the end of this mail) > time show that most delay is 'off cpu': > $ time sudo ./stressipset > > real 2m45.115s > user 0m0.019s > sys 0m0.744s > > Most time, callstack stuck in rcu_barrier: > $ sudo cat /proc/2158/stack > [<0>] rcu_barrier+0x1f6/0x2d0 > [<0>] ip_set_destroy+0x84/0x1d0 [ip_set] > [<0>] nfnetlink_rcv_msg+0x2ac/0x2f0 [nfnetlink] > [<0>] netlink_rcv_skb+0x57/0x100 > [<0>] netlink_unicast+0x19a/0x280 > [<0>] netlink_sendmsg+0x250/0x4d0 > [<0>] __sys_sendto+0x1be/0x1d0 > [<0>] __x64_sys_sendto+0x20/0x30 > [<0>] do_syscall_64+0x42/0xf0 > [<0>] entry_SYSCALL_64_after_hwframe+0x6e/0x76 > > perf_event_open profiling show similiar call signature for rcu_call and synchronize_rcu > > ip_set_destroy(49.651% 2133/4296) > rcu_barrier(80.684% 1721/2133) > wait_for_completion(79.198% 1363/1721) > schedule_timeout(94.864% 1293/1363) > schedule(96.520% 1248/1293) > __schedule(97.436% 1216/1248) > preempt_count_add(0.240% 3/1248) > srso_return_thunk(0.160% 2/1248) > preempt_count_sub(0.160% 2/1248) > srso_return_thunk(0.077% 1/1293) > _raw_spin_unlock_irq(1.027% 14/1363) > _raw_spin_lock_irq(0.514% 7/1363) > __cond_resched(0.220% 3/1363) > srso_return_thunk(0.147% 2/1363) > > ip_set_swap(79.842% 709/888) (this profiling was captured when synchronize_rcu is used in ip_set_swap) > synchronize_rcu(74.330% 527/709) > __wait_rcu_gp(89.184% 470/527) > wait_for_completion(86.383% 406/470) > schedule_timeout(91.133% 370/406) > schedule(95.135% 352/370) > _raw_spin_unlock_irq(3.202% 13/406) > _raw_spin_lock_irq(0.739% 3/406) > srso_return_thunk(0.246% 1/406) > _raw_spin_unlock_irq(7.021% 33/470) > __call_rcu_common.constprop.0(3.830% 18/470) > rcu_gp_is_expedited(3.036% 16/527) > __cond_resched(0.569% 3/527) > srso_return_thunk(0.190% 1/527) > > They all call wait_for_completion, which may sleep on something on > purpose, I guess... That's OK because ip_set_destroy() calls rcu_barrier() which is needed to handle flush in list type of sets. However, rcu_barrier() with call_rcu() together makes multiple destroys one after another slow. But rcu_barrier() is needed for list type of sets only and that can be handled separately. So could you test the patch below? According to my tests it is even a little bit faster than the original code before synchronize_rcu() was added to swap. diff --git a/include/linux/netfilter/ipset/ip_set.h b/include/linux/netfilter/ipset/ip_set.h index e8c350a3ade1..912f750d0bea 100644 --- a/include/linux/netfilter/ipset/ip_set.h +++ b/include/linux/netfilter/ipset/ip_set.h @@ -242,6 +242,8 @@ extern void ip_set_type_unregister(struct ip_set_type *set_type); /* A generic IP set */ struct ip_set { + /* For call_cru in destroy */ + struct rcu_head rcu; /* The name of the set */ char name[IPSET_MAXNAMELEN]; /* Lock protecting the set data */ diff --git a/net/netfilter/ipset/ip_set_core.c b/net/netfilter/ipset/ip_set_core.c index 4c133e06be1d..3bf9bb345809 100644 --- a/net/netfilter/ipset/ip_set_core.c +++ b/net/netfilter/ipset/ip_set_core.c @@ -1182,6 +1182,14 @@ ip_set_destroy_set(struct ip_set *set) kfree(set); } +static void +ip_set_destroy_set_rcu(struct rcu_head *head) +{ + struct ip_set *set = container_of(head, struct ip_set, rcu); + + ip_set_destroy_set(set); +} + static int ip_set_destroy(struct sk_buff *skb, const struct nfnl_info *info, const struct nlattr * const attr[]) { @@ -1193,8 +1201,6 @@ static int ip_set_destroy(struct sk_buff *skb, const struct nfnl_info *info, if (unlikely(protocol_min_failed(attr))) return -IPSET_ERR_PROTOCOL; - /* Must wait for flush to be really finished in list:set */ - rcu_barrier(); /* Commands are serialized and references are * protected by the ip_set_ref_lock. @@ -1206,8 +1212,10 @@ static int ip_set_destroy(struct sk_buff *skb, const struct nfnl_info *info, * counter, so if it's already zero, we can proceed * without holding the lock. */ - read_lock_bh(&ip_set_ref_lock); if (!attr[IPSET_ATTR_SETNAME]) { + /* Must wait for flush to be really finished in list:set */ + rcu_barrier(); + read_lock_bh(&ip_set_ref_lock); for (i = 0; i < inst->ip_set_max; i++) { s = ip_set(inst, i); if (s && (s->ref || s->ref_netlink)) { @@ -1228,6 +1236,9 @@ static int ip_set_destroy(struct sk_buff *skb, const struct nfnl_info *info, inst->is_destroyed = false; } else { u32 flags = flag_exist(info->nlh); + u16 features = 0; + + read_lock_bh(&ip_set_ref_lock); s = find_set_and_id(inst, nla_data(attr[IPSET_ATTR_SETNAME]), &i); if (!s) { @@ -1238,10 +1249,14 @@ static int ip_set_destroy(struct sk_buff *skb, const struct nfnl_info *info, ret = -IPSET_ERR_BUSY; goto out; } + features = s->type->features; ip_set(inst, i) = NULL; read_unlock_bh(&ip_set_ref_lock); - - ip_set_destroy_set(s); + if (features & IPSET_TYPE_NAME) { + /* Must wait for flush to be really finished */ + rcu_barrier(); + } + call_rcu(&s->rcu, ip_set_destroy_set_rcu); } return 0; out: @@ -1394,9 +1409,6 @@ static int ip_set_swap(struct sk_buff *skb, const struct nfnl_info *info, ip_set(inst, to_id) = from; write_unlock_bh(&ip_set_ref_lock); - /* Make sure all readers of the old set pointers are completed. */ - synchronize_rcu(); - return 0; } @@ -2357,6 +2369,9 @@ ip_set_net_exit(struct net *net) inst->is_deleted = true; /* flag for ip_set_nfnl_put */ + /* Wait for call_rcu() in destroy */ + rcu_barrier(); + nfnl_lock(NFNL_SUBSYS_IPSET); for (i = 0; i < inst->ip_set_max; i++) { set = ip_set(inst, i); Best regards, Jozsef -- E-mail : kadlec@blackhole.kfki.hu, kadlecsik.jozsef@wigner.hu PGP key : https://wigner.hu/~kadlec/pgp_public_key.txt Address : Wigner Research Centre for Physics H-1525 Budapest 114, POB. 49, Hungary