Received: by 2002:ac0:a594:0:0:0:0:0 with SMTP id m20-v6csp2054871imm; Wed, 16 May 2018 07:13:56 -0700 (PDT) X-Google-Smtp-Source: AB8JxZqtdVI1ZPzLBMPX6IzeFZujDI4l/ssqkI1qESm8qd+A5LOC2CAaiuM8JWIe5hFD3dRzuYFl X-Received: by 2002:a65:5105:: with SMTP id f5-v6mr909058pgq.232.1526480036368; Wed, 16 May 2018 07:13:56 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1526480036; cv=none; d=google.com; s=arc-20160816; b=RRKdvyTvkyqf0ue9hrWLI3SCxanphvw79bRI4L5/Kkwa2vqGbqcTySOMiSGgwWfMqs T/19gi4V2k8YcBIrSmfWu83kXGzRO3RaejwB7xAFRLYkyIj/F4/6N+01lYrVYAm2yqqq 2LQtXV4y2HklLoh62PazdndOm6MlYd1p7KIOA1GisxEm1cNEnlSu+x4/q0UFs+CjSzhe n43l29FeKBDh0775e2MSz15N9W7yD4ug1b0hbC1LvyRpe5E1miiF4qzVknn0DpbM5unO AKSANKHVVTCTdJCr5WkpItMnusQaj2ePe+e8Ww+3TkwXHDZh5i+0LoLluZh+W8rTf98x GSvg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature:arc-authentication-results; bh=QWSowJLaWhHI9RogtQmFIuy4ec9GFS1TqPj6sfhqU0I=; b=yl5OXp/T6mJ0anmeH2+cd7lPbOWUuWVNLBi7KQFwaUS9Lin98V3H267Fs85M035x94 HoceSOvv0CEOO/zip+KnCBzAlOYjkUmcOmeWqnAou2FCF30f+9pwAjxXaGeVCh9OvKB4 VIABvU8/lBj5z4NzctTHpfx1P2rBBpMTiZeSuReH4NyBijXSYobE9z6VYalTzQhPVjGH bgbdz4VBtjYJ4AxN9hk0XHzZKQ+XLYZPQp1Lpw8OaT2QLzyWiN2lseMcB960voX11gS/ nG89ruHeVkmhAiALQVa+j4jwkSR7TqCXM8MYIXVZWJTfD7hg6BiiJuFYmzfcl4tOp+Z5 SNGA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@resnulli-us.20150623.gappssmtp.com header.s=20150623 header.b=1InN0Kz1; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f34-v6si2577097ple.52.2018.05.16.07.13.41; Wed, 16 May 2018 07:13:56 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@resnulli-us.20150623.gappssmtp.com header.s=20150623 header.b=1InN0Kz1; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752410AbeEPONX (ORCPT + 99 others); Wed, 16 May 2018 10:13:23 -0400 Received: from mail-wm0-f66.google.com ([74.125.82.66]:36627 "EHLO mail-wm0-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751595AbeEPONV (ORCPT ); Wed, 16 May 2018 10:13:21 -0400 Received: by mail-wm0-f66.google.com with SMTP id n10-v6so2141530wmc.1 for ; Wed, 16 May 2018 07:13:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=resnulli-us.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=QWSowJLaWhHI9RogtQmFIuy4ec9GFS1TqPj6sfhqU0I=; b=1InN0Kz1mYrTElz90GRHReuV595EPmgDPMn4itleWHVzKu3s7cNfMwmtY0LXdY5tJ4 V3QiR44IlDGd1OQhEFE4YQAk+oBfufGzn9dxIhQpeluR+s+dHilkMxjyhNX9jmV4GOwG YOx6xzhWDfZIZpHpc5ABxK3CQbHivi7CBUu4qZe5kuI37CsrTi1dy0N/cH5cUm7FLmdA 8OindECLOs821LlvNd/fqiaWQOqFzPiiQ9caoxI1E7z5BhbnYGoNccxEdMsnHAs+HbG6 KUYP9137RgG8WdvnXKzaTCEcKN6cj/6QgxcoMdecrmPDtrpICE4QOIISeJvjPexY8XoD UGJg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=QWSowJLaWhHI9RogtQmFIuy4ec9GFS1TqPj6sfhqU0I=; b=GrKEXOJcbc4toQqoel7EUCbXCMxXgTyriVAfrfb0zR2WJsdRzvMZEVeXf/JtjKGvE+ 9yK31iP+EwbLwARPQ9xJpjrm1Fouv/gGuDS2jJGS2PeNhS7HiZVIX8ckSuNqqnaJzuTj UeLiAtYyFUuYQriKX1Ro0q80kGCt/JE2CNahk6KRxMG1Xf9QolGWgil+/ThI/R7bfpMv izmgoSMgV+8GPVRiQYxnw1rht6KDkdypI8qVVUaA/naT/vtjacQJbOXIryDDp8sfLZuG Am9TCldKfT1CYrtCtPHrRmfYKSQy2af4Fo+GgxxNFTABKpWFz8slyVqTwKsS7hb17lj+ l+kg== X-Gm-Message-State: ALKqPwf7EjqnlA7ITp3qWLiuW1se88bZ7N9jZpIw9dyAdR/v3G+ke5Tw NPu8ZoKs/4EYM4DurZ3gutTGkg== X-Received: by 2002:a1c:3313:: with SMTP id z19-v6mr751909wmz.48.1526480000118; Wed, 16 May 2018 07:13:20 -0700 (PDT) Received: from localhost (ip-94-113-127-8.net.upcbroadband.cz. [94.113.127.8]) by smtp.gmail.com with ESMTPSA id p33-v6sm4707804wrc.14.2018.05.16.07.13.19 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Wed, 16 May 2018 07:13:19 -0700 (PDT) Date: Wed, 16 May 2018 16:13:19 +0200 From: Jiri Pirko To: Vlad Buslov Cc: netdev@vger.kernel.org, davem@davemloft.net, jhs@mojatatu.com, xiyou.wangcong@gmail.com, pablo@netfilter.org, kadlec@blackhole.kfki.hu, fw@strlen.de, ast@kernel.org, daniel@iogearbox.net, edumazet@google.com, keescook@chromium.org, linux-kernel@vger.kernel.org, netfilter-devel@vger.kernel.org, coreteam@netfilter.org, kliteyn@mellanox.com Subject: Re: [PATCH 12/14] net: sched: retry action check-insert on concurrent modification Message-ID: <20180516141319.GO1972@nanopsycho> References: <1526308035-12484-1-git-send-email-vladbu@mellanox.com> <1526308035-12484-13-git-send-email-vladbu@mellanox.com> <20180516095953.GI1972@nanopsycho> <20180516122600.GM1972@nanopsycho> <20180516132135.GN1972@nanopsycho> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.9.2 (2017-12-15) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Wed, May 16, 2018 at 03:52:20PM CEST, vladbu@mellanox.com wrote: > >On Wed 16 May 2018 at 13:21, Jiri Pirko wrote: >> Wed, May 16, 2018 at 02:43:58PM CEST, vladbu@mellanox.com wrote: >>> >>>On Wed 16 May 2018 at 12:26, Jiri Pirko wrote: >>>> Wed, May 16, 2018 at 01:55:06PM CEST, vladbu@mellanox.com wrote: >>>>> >>>>>On Wed 16 May 2018 at 09:59, Jiri Pirko wrote: >>>>>> Mon, May 14, 2018 at 04:27:13PM CEST, vladbu@mellanox.com wrote: >>>>>>>Retry check-insert sequence in action init functions if action with same >>>>>>>index was inserted concurrently. >>>>>>> >>>>>>>Signed-off-by: Vlad Buslov >>>>>>>--- >>>>>>> net/sched/act_bpf.c | 8 +++++++- >>>>>>> net/sched/act_connmark.c | 8 +++++++- >>>>>>> net/sched/act_csum.c | 8 +++++++- >>>>>>> net/sched/act_gact.c | 8 +++++++- >>>>>>> net/sched/act_ife.c | 8 +++++++- >>>>>>> net/sched/act_ipt.c | 8 +++++++- >>>>>>> net/sched/act_mirred.c | 8 +++++++- >>>>>>> net/sched/act_nat.c | 8 +++++++- >>>>>>> net/sched/act_pedit.c | 8 +++++++- >>>>>>> net/sched/act_police.c | 9 ++++++++- >>>>>>> net/sched/act_sample.c | 8 +++++++- >>>>>>> net/sched/act_simple.c | 9 ++++++++- >>>>>>> net/sched/act_skbedit.c | 8 +++++++- >>>>>>> net/sched/act_skbmod.c | 8 +++++++- >>>>>>> net/sched/act_tunnel_key.c | 9 ++++++++- >>>>>>> net/sched/act_vlan.c | 9 ++++++++- >>>>>>> 16 files changed, 116 insertions(+), 16 deletions(-) >>>>>>> >>>>>>>diff --git a/net/sched/act_bpf.c b/net/sched/act_bpf.c >>>>>>>index 5554bf7..7e20fdc 100644 >>>>>>>--- a/net/sched/act_bpf.c >>>>>>>+++ b/net/sched/act_bpf.c >>>>>>>@@ -299,10 +299,16 @@ static int tcf_bpf_init(struct net *net, struct nlattr *nla, >>>>>>> >>>>>>> parm = nla_data(tb[TCA_ACT_BPF_PARMS]); >>>>>>> >>>>>>>+replay: >>>>>>> if (!tcf_idr_check(tn, parm->index, act, bind)) { >>>>>>> ret = tcf_idr_create(tn, parm->index, est, act, >>>>>>> &act_bpf_ops, bind, true); >>>>>>>- if (ret < 0) >>>>>>>+ /* Action with specified index was created concurrently. >>>>>>>+ * Check again. >>>>>>>+ */ >>>>>>>+ if (parm->index && ret == -ENOSPC) >>>>>>>+ goto replay; >>>>>>>+ else if (ret) >>>>>> >>>>>> Hmm, looks like you are doing the same/very similar thing in every act >>>>>> code. I think it would make sense to introduce a helper function for >>>>>> this purpose. >>>>> >>>>>This code uses goto so it can't be easily refactored into standalone >>>>>function. Could you specify which part of this code you suggest to >>>>>extract? >>>> >>>> Hmm, looking at the code, I think that what would help is to have a >>>> helper that would atomically check if index exists and if not, it would >>>> allocate one. Something like: >>>> >>>> >>>> int tcf_idr_check_alloc(struct tc_action_net *tn, u32 *index, >>>> struct tc_action **a, int bind) >>>> { >>>> struct tcf_idrinfo *idrinfo = tn->idrinfo; >>>> struct tc_action *p; >>>> int err; >>>> >>>> spin_lock(&idrinfo->lock); >>>> if (*index) { >>>> p = idr_find(&idrinfo->action_idr, *index); >>>> if (p) { >>>> if (bind) >>>> p->tcfa_bindcnt++; >>>> p->tcfa_refcnt++; >>>> *a = p; >>>> err = 0; >>>> } else { >>>> *a = NULL; >>>> err = idr_alloc_u32(idr, NULL, index, >>>> *index, GFP_ATOMIC); >>>> } >>>> } else { >>>> *index = 1; >>>> *a = NULL; >>>> err = idr_alloc_u32(idr, NULL, index, UINT_MAX, GFP_ATOMIC); >>>> } >>>> spin_unlock(&idrinfo->lock); >>>> return err; >>>> } >>>> >>>> The act code would just check if "a" is NULL and if so, it would call >>>> tcf_idr_create() with allocated index as arg. >>> >>>What about multiple actions that have arbitrary code between initial >>>check and idr allocation that is currently inside tcf_idr_create()? >> >> Why it would be a problem to have them after the allocation? > >Lets look at mirred for exmple: > exists = tcf_idr_check(tn, parm->index, a, bind); > if (exists && bind) > return 0; > > switch (parm->eaction) { > case TCA_EGRESS_MIRROR: > case TCA_EGRESS_REDIR: > case TCA_INGRESS_REDIR: > case TCA_INGRESS_MIRROR: > break; > default: > if (exists) > tcf_idr_release(*a, bind); > NL_SET_ERR_MSG_MOD(extack, "Unknown mirred option"); > return -EINVAL; > } > if (parm->ifindex) { > dev = __dev_get_by_index(net, parm->ifindex); > if (dev == NULL) { > if (exists) > tcf_idr_release(*a, bind); > return -ENODEV; > } > mac_header_xmit = dev_is_mac_header_xmit(dev); > } else { > dev = NULL; > } > > if (!exists) { > if (!dev) { > NL_SET_ERR_MSG_MOD(extack, "Specified device does not exist"); > return -EINVAL; > } > ret = tcf_idr_create(tn, parm->index, est, a, > &act_mirred_ops, bind, true); > /* Action with specified index was created concurrently. > * Check again. > */ > if (parm->index && ret == -ENOSPC) > goto replay; > else if (ret) > return ret; > >There are several returns and cleanup is only performed when action >exists. So all code like that will have to be audited to also remove >index from idr, otherwise idr handles leak on return. Yeah. You have to take care of the error path. > >> >> There is one issue though with my draft. tcf_idr_insert() function >> which actually assigns a "p" pointer to the idr index is called later on. >> Until that happens, the idr_find() would return NULL even if the index >> is actually allocated. We cannot assign "p" in tcf_idr_check_alloc() >> because it is allocated only later on in tcf_idr_create(). But that is >> resolvable by the following trick: >> >> int tcf_idr_check_alloc(struct tc_action_net *tn, u32 *index, >> struct tc_action **a, int bind) >> { >> struct tcf_idrinfo *idrinfo = tn->idrinfo; >> struct tc_action *p; >> int err; >> >> again: >> spin_lock(&idrinfo->lock); >> if (*index) { >> p = idr_find(&idrinfo->action_idr, *index); >> if (IS_ERR(p)) { >> /* This means that another process allocated >> * index but did not assign the pointer yet. >> */ >> spin_unlock(&idrinfo->lock); >> goto again; >> } >> if (p) { >> if (bind) >> p->tcfa_bindcnt++; >> p->tcfa_refcnt++; >> *a = p; >> err = 0; >> } else { >> *a = NULL; >> err = idr_alloc_u32(idr, NULL, index, >> *index, GFP_ATOMIC); >> idr_replace(&idrinfo->action_idr, >> ERR_PTR(-EBUSY), *index); >> } >> } else { >> *index = 1; >> *a = NULL; >> err = idr_alloc_u32(idr, NULL, index, UINT_MAX, GFP_ATOMIC); >> idr_replace(&idrinfo->action_idr, ERR_PTR(-EBUSY), *index); >> } >> spin_unlock(&idrinfo->lock); >> return err; >> } >> > >So users of action idr that might perform concurrent lookups are also >have to be changed to check for error pointers, that now can be inserted >into idr? Seems like a complex change... You can just add a simple check into tcf_idr_lookup(). Where else? > >> >> >> >> >>> >>>> >>>> >>>>> >>>>>> >>>>>> [...] >>>>> >>> >