Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751154AbdCQIOV convert rfc822-to-8bit (ORCPT ); Fri, 17 Mar 2017 04:14:21 -0400 Received: from mga06.intel.com ([134.134.136.31]:17973 "EHLO mga06.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750998AbdCQIOU (ORCPT ); Fri, 17 Mar 2017 04:14:20 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.36,176,1486454400"; d="scan'208";a="78113912" From: "Reshetova, Elena" To: Daniel Borkmann , "netdev@vger.kernel.org" CC: "bridge@lists.linux-foundation.org" , "linux-kernel@vger.kernel.org" , "kuznet@ms2.inr.ac.ru" , "jmorris@namei.org" , "kaber@trash.net" , "stephen@networkplumber.org" , "peterz@infradead.org" , "keescook@chromium.org" , Hans Liljestrand , "David Windsor" , "alexei.starovoitov@gmail.com" Subject: RE: [PATCH 08/17] net: convert sk_filter.refcnt from atomic_t to refcount_t Thread-Topic: [PATCH 08/17] net: convert sk_filter.refcnt from atomic_t to refcount_t Thread-Index: AQHSnmoyBs0Sg44D60OBg/E/oKzLG6GXoT+AgAEJnJA= Date: Fri, 17 Mar 2017 08:02:02 +0000 Message-ID: <2236FBA76BA1254E88B949DDB74E612B41C5A560@IRSMSX102.ger.corp.intel.com> References: <1489678147-21404-1-git-send-email-elena.reshetova@intel.com> <1489678147-21404-9-git-send-email-elena.reshetova@intel.com> <58CAB7A1.8060500@iogearbox.net> In-Reply-To: <58CAB7A1.8060500@iogearbox.net> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [163.33.239.180] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 8BIT MIME-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4136 Lines: 129 > On 03/16/2017 04:28 PM, Elena Reshetova wrote: > > refcount_t type and corresponding API should be > > used instead of atomic_t when the variable is used as > > a reference counter. This allows to avoid accidental > > refcounter overflows that might lead to use-after-free > > situations. > > > > Signed-off-by: Elena Reshetova > > Signed-off-by: Hans Liljestrand > > Signed-off-by: Kees Cook > > Signed-off-by: David Windsor > > --- > > include/linux/filter.h | 3 ++- > > net/core/filter.c | 7 ++++--- > > 2 files changed, 6 insertions(+), 4 deletions(-) > > > > diff --git a/include/linux/filter.h b/include/linux/filter.h > > index 8053c38..20247e7 100644 > > --- a/include/linux/filter.h > > +++ b/include/linux/filter.h > > @@ -7,6 +7,7 @@ > > #include > > > > #include > > +#include > > #include > > #include > > #include > > @@ -431,7 +432,7 @@ struct bpf_prog { > > }; > > > > struct sk_filter { > > - atomic_t refcnt; > > + refcount_t refcnt; > > struct rcu_head rcu; > > struct bpf_prog *prog; > > }; > > diff --git a/net/core/filter.c b/net/core/filter.c > > index ebaeaf2..62267e2 100644 > > --- a/net/core/filter.c > > +++ b/net/core/filter.c > > @@ -928,7 +928,7 @@ static void sk_filter_release_rcu(struct rcu_head *rcu) > > */ > > static void sk_filter_release(struct sk_filter *fp) > > { > > - if (atomic_dec_and_test(&fp->refcnt)) > > + if (refcount_dec_and_test(&fp->refcnt)) > > call_rcu(&fp->rcu, sk_filter_release_rcu); > > } > > > > @@ -950,7 +950,7 @@ bool sk_filter_charge(struct sock *sk, struct sk_filter *fp) > > /* same check as in sock_kmalloc() */ > > if (filter_size <= sysctl_optmem_max && > > atomic_read(&sk->sk_omem_alloc) + filter_size < > sysctl_optmem_max) { > > - atomic_inc(&fp->refcnt); > > + refcount_inc(&fp->refcnt); > > atomic_add(filter_size, &sk->sk_omem_alloc); > > return true; > > } > > @@ -1179,12 +1179,13 @@ static int __sk_attach_prog(struct bpf_prog *prog, > struct sock *sk) > > return -ENOMEM; > > > > fp->prog = prog; > > - atomic_set(&fp->refcnt, 0); > > + refcount_set(&fp->refcnt, 1); > > > > if (!sk_filter_charge(sk, fp)) { > > kfree(fp); > > return -ENOMEM; > > } > > + refcount_set(&fp->refcnt, 1); > > Regarding the two subsequent refcount_set(, 1) that look a bit strange > due to the sk_filter_charge() having refcount_inc() I presume ... can't > the refcount API handle such corner case? Yes, it was exactly because of recount_inc() from zero in sk_filter_charge(). refcount_inc() would refuse to do an inc from zero for security reasons. At some point in past we discussed refcount_inc_not_one() but it was decided to be too special case to support (we really have very little of such cases). Or alternatively the let the > sk_filter_charge() handle it, for example: > > bool __sk_filter_charge(struct sock *sk, struct sk_filter *fp) > { > u32 filter_size = bpf_prog_size(fp->prog->len); > > /* same check as in sock_kmalloc() */ > if (filter_size <= sysctl_optmem_max && > atomic_read(&sk->sk_omem_alloc) + filter_size < > sysctl_optmem_max) { > atomic_add(filter_size, &sk->sk_omem_alloc); > return true; > } > return false; > } > > And this goes to filter.h: > > bool __sk_filter_charge(struct sock *sk, struct sk_filter *fp); > > bool sk_filter_charge(struct sock *sk, struct sk_filter *fp) > { > bool ret = __sk_filter_charge(sk, fp); > if (ret) > refcount_inc(&fp->refcnt); > return ret; > } > > ... and let __sk_attach_prog() call __sk_filter_charge() and only fo > the second refcount_set()? > > > old_fp = rcu_dereference_protected(sk->sk_filter, > > > lockdep_sock_is_held(sk)); > > Oh, yes, this would make it look less awkward. Thank you for the suggestion Daniel! I guess we try to be less invasive for code changes overall, maybe even too careful... I will update the patch and send a new version. Best Regards, Elena.