Received: by 2002:ac0:a594:0:0:0:0:0 with SMTP id m20-v6csp2100998imm; Wed, 16 May 2018 07:55:48 -0700 (PDT) X-Google-Smtp-Source: AB8JxZp6JhuyT5nW9eCDqwLCyqzzH8gaD6Sn2fjtlbx7xyh9wWVUxU9QUG2pfrmU/kD+u+igb7gg X-Received: by 2002:a63:8f43:: with SMTP id r3-v6mr1020827pgn.10.1526482548564; Wed, 16 May 2018 07:55:48 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1526482548; cv=none; d=google.com; s=arc-20160816; b=RcXbMd8/o8lZDD+0vJcPAQ2ws5KBHUqSyGOeJWexyCvFZiZuuAjlbH+CVNp59khRjo GkCqXG6XMUuNFA0TMpmm+6VsH3bxKyNL/jabbIXyKvnjBitoX/AcOmAmIFk169jh1/H4 E+54ZkvNC+BKupFv/+3FPydU8Gv7VDQRFHkpPEaHklncyHzPkN/mV9lBH02vEOnouswC v8Um1A8rujOdFRuLiFrXVCtHMUtyEiV39XQ8coKXlhJam1bi3IO/0Tj9gZyKXyz2PJsV a5//Vz+xU/favzRR541EM/lAsGW2KBuoqTrgkGcXjr+oHDyQMgg1TQRZznVpYbDutXQP vCaA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature:arc-authentication-results; bh=JsevyDETDBnlE0JXOXgPC+hQtHwlleYP/jEY9SFwJcI=; b=fsnSIQHaF0khLGtjVDa0FFukSTVOw46oP2tEt35wvGaFVIexs+MLpvQrJS+O5xzwSg kVcWdIMi7jhQW6FT4lz6Ubu9i8CrIgrTNHQdPeYYfIqP3+uxbuHbZIw9lexScMJErHr1 i3arWI7PyRGxA/TnIE4MM2/btav3rBgBqPfyhKpT2ILJIyw4mh2suoUZTkoHIC5SyWyO R7bgxHAjK8CXEOxT7v/gc0Y1XP0BsHCDhbIxu/wJND5UJZBFZMteUnNdbpMy5KesmkZU 4QLAY98xNlodPDOUW7qQA9AT3NoKNMZ4g/XwLhNwZs2uEUO7hQPscLHb99jQiwM4b+pd bzgQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@resnulli-us.20150623.gappssmtp.com header.s=20150623 header.b=m4nqy7fP; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id n4-v6si2813814pfk.277.2018.05.16.07.55.33; Wed, 16 May 2018 07:55:48 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@resnulli-us.20150623.gappssmtp.com header.s=20150623 header.b=m4nqy7fP; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1750879AbeEPOzU (ORCPT + 99 others); Wed, 16 May 2018 10:55:20 -0400 Received: from mail-wm0-f67.google.com ([74.125.82.67]:40230 "EHLO mail-wm0-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750846AbeEPOzS (ORCPT ); Wed, 16 May 2018 10:55:18 -0400 Received: by mail-wm0-f67.google.com with SMTP id j5-v6so2529731wme.5 for ; Wed, 16 May 2018 07:55:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=resnulli-us.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=JsevyDETDBnlE0JXOXgPC+hQtHwlleYP/jEY9SFwJcI=; b=m4nqy7fPbFPzRYcusl89J+WK241hBbskU7LMwe0j6mevlHeRBDq07MqgsN72q+GO95 86h516eZsOgF5rS7Lcv4sDS4mgozefmT586hJyegLff1ZDh+EpZQ+xXOXZj8J6Krxw8F 76Wsqg+fH7PHBkplKiMHJDXeSWkw7qmD+g8WWmrm46QHkGPkr6B39eXB3u2rrIkl+54A 5pMFGMnetsFrL8bWml0lGfNB1xVBrEw90Z2VcrQPInQXyaycP2F1AY5wAMfPUM3sEEEE FU325tao81WzUiFy00sIxjYSEffLLd10NL3bYilzv3st8irBNCRblGSHl8xu1XlBEH4k dh6A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=JsevyDETDBnlE0JXOXgPC+hQtHwlleYP/jEY9SFwJcI=; b=Pp4cNyDy2qdYnINK9aiay6+6UEshGX4CJpqaypfOGnK0Tzprx8vmUioanyVe1DFXQ9 zrSip7xLCXgc3DgzXl5eLm4jwA4CYJ6ncC8e4+YNf9eUWsPbwLk+XPWw8osShvkj8Oyk +GAXN3RGjpUhLHofVugT+ab6UA6g7PtiTCHqKeT8hhjAHhy7IlEnAmxuDzTrPcrPCg3a 5KXhtn3BAfV1XmXA1vifg0lcY6mymSAY9DPyR6Q3Ma6FqBFe3r6LvtLwXycs07umCjNQ 2y65lN18wdf2wIJOva/FY1QnWxbT1AxsDhWlXnioidzW/xYkQgbZVHGUfnvx0CBxkdmL negA== X-Gm-Message-State: ALKqPwejYuFs+YjZZfoUmimxY7tcrZM3yZsTBVx38A6FVoHXl7gOh/Gn LTdZSnFJrQzeBWV+8BQZTPQuLg== X-Received: by 2002:a1c:eb0e:: with SMTP id j14-v6mr804921wmh.87.1526482516977; Wed, 16 May 2018 07:55:16 -0700 (PDT) Received: from localhost (ip-94-113-127-8.net.upcbroadband.cz. [94.113.127.8]) by smtp.gmail.com with ESMTPSA id 12-v6sm5296525wmn.27.2018.05.16.07.55.16 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Wed, 16 May 2018 07:55:16 -0700 (PDT) Date: Wed, 16 May 2018 16:55:15 +0200 From: Jiri Pirko To: Vlad Buslov Cc: netdev@vger.kernel.org, davem@davemloft.net, jhs@mojatatu.com, xiyou.wangcong@gmail.com, pablo@netfilter.org, kadlec@blackhole.kfki.hu, fw@strlen.de, ast@kernel.org, daniel@iogearbox.net, edumazet@google.com, keescook@chromium.org, linux-kernel@vger.kernel.org, netfilter-devel@vger.kernel.org, coreteam@netfilter.org, kliteyn@mellanox.com Subject: Re: [PATCH 12/14] net: sched: retry action check-insert on concurrent modification Message-ID: <20180516145515.GP1972@nanopsycho> References: <1526308035-12484-1-git-send-email-vladbu@mellanox.com> <1526308035-12484-13-git-send-email-vladbu@mellanox.com> <20180516095953.GI1972@nanopsycho> <20180516122600.GM1972@nanopsycho> <20180516132135.GN1972@nanopsycho> <20180516141319.GO1972@nanopsycho> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.9.2 (2017-12-15) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Wed, May 16, 2018 at 04:26:40PM CEST, vladbu@mellanox.com wrote: > >On Wed 16 May 2018 at 14:13, Jiri Pirko wrote: >> Wed, May 16, 2018 at 03:52:20PM CEST, vladbu@mellanox.com wrote: >>> >>>On Wed 16 May 2018 at 13:21, Jiri Pirko wrote: >>>> Wed, May 16, 2018 at 02:43:58PM CEST, vladbu@mellanox.com wrote: >>>>> >>>>>On Wed 16 May 2018 at 12:26, Jiri Pirko wrote: >>>>>> Wed, May 16, 2018 at 01:55:06PM CEST, vladbu@mellanox.com wrote: >>>>>>> >>>>>>>On Wed 16 May 2018 at 09:59, Jiri Pirko wrote: >>>>>>>> Mon, May 14, 2018 at 04:27:13PM CEST, vladbu@mellanox.com wrote: >>>>>>>>>Retry check-insert sequence in action init functions if action with same >>>>>>>>>index was inserted concurrently. >>>>>>>>> >>>>>>>>>Signed-off-by: Vlad Buslov >>>>>>>>>--- >>>>>>>>> net/sched/act_bpf.c | 8 +++++++- >>>>>>>>> net/sched/act_connmark.c | 8 +++++++- >>>>>>>>> net/sched/act_csum.c | 8 +++++++- >>>>>>>>> net/sched/act_gact.c | 8 +++++++- >>>>>>>>> net/sched/act_ife.c | 8 +++++++- >>>>>>>>> net/sched/act_ipt.c | 8 +++++++- >>>>>>>>> net/sched/act_mirred.c | 8 +++++++- >>>>>>>>> net/sched/act_nat.c | 8 +++++++- >>>>>>>>> net/sched/act_pedit.c | 8 +++++++- >>>>>>>>> net/sched/act_police.c | 9 ++++++++- >>>>>>>>> net/sched/act_sample.c | 8 +++++++- >>>>>>>>> net/sched/act_simple.c | 9 ++++++++- >>>>>>>>> net/sched/act_skbedit.c | 8 +++++++- >>>>>>>>> net/sched/act_skbmod.c | 8 +++++++- >>>>>>>>> net/sched/act_tunnel_key.c | 9 ++++++++- >>>>>>>>> net/sched/act_vlan.c | 9 ++++++++- >>>>>>>>> 16 files changed, 116 insertions(+), 16 deletions(-) >>>>>>>>> >>>>>>>>>diff --git a/net/sched/act_bpf.c b/net/sched/act_bpf.c >>>>>>>>>index 5554bf7..7e20fdc 100644 >>>>>>>>>--- a/net/sched/act_bpf.c >>>>>>>>>+++ b/net/sched/act_bpf.c >>>>>>>>>@@ -299,10 +299,16 @@ static int tcf_bpf_init(struct net *net, struct nlattr *nla, >>>>>>>>> >>>>>>>>> parm = nla_data(tb[TCA_ACT_BPF_PARMS]); >>>>>>>>> >>>>>>>>>+replay: >>>>>>>>> if (!tcf_idr_check(tn, parm->index, act, bind)) { >>>>>>>>> ret = tcf_idr_create(tn, parm->index, est, act, >>>>>>>>> &act_bpf_ops, bind, true); >>>>>>>>>- if (ret < 0) >>>>>>>>>+ /* Action with specified index was created concurrently. >>>>>>>>>+ * Check again. >>>>>>>>>+ */ >>>>>>>>>+ if (parm->index && ret == -ENOSPC) >>>>>>>>>+ goto replay; >>>>>>>>>+ else if (ret) >>>>>>>> >>>>>>>> Hmm, looks like you are doing the same/very similar thing in every act >>>>>>>> code. I think it would make sense to introduce a helper function for >>>>>>>> this purpose. >>>>>>> >>>>>>>This code uses goto so it can't be easily refactored into standalone >>>>>>>function. Could you specify which part of this code you suggest to >>>>>>>extract? >>>>>> >>>>>> Hmm, looking at the code, I think that what would help is to have a >>>>>> helper that would atomically check if index exists and if not, it would >>>>>> allocate one. Something like: >>>>>> >>>>>> >>>>>> int tcf_idr_check_alloc(struct tc_action_net *tn, u32 *index, >>>>>> struct tc_action **a, int bind) >>>>>> { >>>>>> struct tcf_idrinfo *idrinfo = tn->idrinfo; >>>>>> struct tc_action *p; >>>>>> int err; >>>>>> >>>>>> spin_lock(&idrinfo->lock); >>>>>> if (*index) { >>>>>> p = idr_find(&idrinfo->action_idr, *index); >>>>>> if (p) { >>>>>> if (bind) >>>>>> p->tcfa_bindcnt++; >>>>>> p->tcfa_refcnt++; >>>>>> *a = p; >>>>>> err = 0; >>>>>> } else { >>>>>> *a = NULL; >>>>>> err = idr_alloc_u32(idr, NULL, index, >>>>>> *index, GFP_ATOMIC); >>>>>> } >>>>>> } else { >>>>>> *index = 1; >>>>>> *a = NULL; >>>>>> err = idr_alloc_u32(idr, NULL, index, UINT_MAX, GFP_ATOMIC); >>>>>> } >>>>>> spin_unlock(&idrinfo->lock); >>>>>> return err; >>>>>> } >>>>>> >>>>>> The act code would just check if "a" is NULL and if so, it would call >>>>>> tcf_idr_create() with allocated index as arg. >>>>> >>>>>What about multiple actions that have arbitrary code between initial >>>>>check and idr allocation that is currently inside tcf_idr_create()? >>>> >>>> Why it would be a problem to have them after the allocation? >>> >>>Lets look at mirred for exmple: >>> exists = tcf_idr_check(tn, parm->index, a, bind); >>> if (exists && bind) >>> return 0; >>> >>> switch (parm->eaction) { >>> case TCA_EGRESS_MIRROR: >>> case TCA_EGRESS_REDIR: >>> case TCA_INGRESS_REDIR: >>> case TCA_INGRESS_MIRROR: >>> break; >>> default: >>> if (exists) >>> tcf_idr_release(*a, bind); >>> NL_SET_ERR_MSG_MOD(extack, "Unknown mirred option"); >>> return -EINVAL; >>> } >>> if (parm->ifindex) { >>> dev = __dev_get_by_index(net, parm->ifindex); >>> if (dev == NULL) { >>> if (exists) >>> tcf_idr_release(*a, bind); >>> return -ENODEV; >>> } >>> mac_header_xmit = dev_is_mac_header_xmit(dev); >>> } else { >>> dev = NULL; >>> } >>> >>> if (!exists) { >>> if (!dev) { >>> NL_SET_ERR_MSG_MOD(extack, "Specified device does not exist"); >>> return -EINVAL; >>> } >>> ret = tcf_idr_create(tn, parm->index, est, a, >>> &act_mirred_ops, bind, true); >>> /* Action with specified index was created concurrently. >>> * Check again. >>> */ >>> if (parm->index && ret == -ENOSPC) >>> goto replay; >>> else if (ret) >>> return ret; >>> >>>There are several returns and cleanup is only performed when action >>>exists. So all code like that will have to be audited to also remove >>>index from idr, otherwise idr handles leak on return. >> >> Yeah. You have to take care of the error path. >> >> >>> >>>> >>>> There is one issue though with my draft. tcf_idr_insert() function >>>> which actually assigns a "p" pointer to the idr index is called later on. >>>> Until that happens, the idr_find() would return NULL even if the index >>>> is actually allocated. We cannot assign "p" in tcf_idr_check_alloc() >>>> because it is allocated only later on in tcf_idr_create(). But that is >>>> resolvable by the following trick: >>>> >>>> int tcf_idr_check_alloc(struct tc_action_net *tn, u32 *index, >>>> struct tc_action **a, int bind) >>>> { >>>> struct tcf_idrinfo *idrinfo = tn->idrinfo; >>>> struct tc_action *p; >>>> int err; >>>> >>>> again: >>>> spin_lock(&idrinfo->lock); >>>> if (*index) { >>>> p = idr_find(&idrinfo->action_idr, *index); >>>> if (IS_ERR(p)) { >>>> /* This means that another process allocated >>>> * index but did not assign the pointer yet. >>>> */ >>>> spin_unlock(&idrinfo->lock); >>>> goto again; >>>> } >>>> if (p) { >>>> if (bind) >>>> p->tcfa_bindcnt++; >>>> p->tcfa_refcnt++; >>>> *a = p; >>>> err = 0; >>>> } else { >>>> *a = NULL; >>>> err = idr_alloc_u32(idr, NULL, index, >>>> *index, GFP_ATOMIC); >>>> idr_replace(&idrinfo->action_idr, >>>> ERR_PTR(-EBUSY), *index); >>>> } >>>> } else { >>>> *index = 1; >>>> *a = NULL; >>>> err = idr_alloc_u32(idr, NULL, index, UINT_MAX, GFP_ATOMIC); >>>> idr_replace(&idrinfo->action_idr, ERR_PTR(-EBUSY), *index); >>>> } >>>> spin_unlock(&idrinfo->lock); >>>> return err; >>>> } >>>> >>> >>>So users of action idr that might perform concurrent lookups are also >>>have to be changed to check for error pointers, that now can be inserted >>>into idr? Seems like a complex change... >> >> You can just add a simple check into tcf_idr_lookup(). Where else? >> > >To me it looks like we take something simple and already working, and >make it complex to save few lines of code in action init... Well it is not only about saving lines of code but also to avoid duplications of code. And duplications lead to bugs. That's why I believe that helper here is needed. > >Anyway, how should I do patch split for this? >Patch to implement function you outlined and another one to modify all >actions to use it(with all refactoring to not leak references)? Or patch >per action is better approach? If you do it per-action, you would end up with many patches. It is not needed I believe.