Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753440AbcKRReC convert rfc822-to-8bit (ORCPT ); Fri, 18 Nov 2016 12:34:02 -0500 Received: from mga01.intel.com ([192.55.52.88]:1150 "EHLO mga01.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752136AbcKRReA (ORCPT ); Fri, 18 Nov 2016 12:34:00 -0500 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.31,510,1473145200"; d="scan'208";a="1087197714" From: "Reshetova, Elena" To: Alexei Starovoitov , Peter Zijlstra CC: Kees Cook , Greg KH , Will Deacon , Arnd Bergmann , "Thomas Gleixner" , Ingo Molnar , "H. Peter Anvin" , David Windsor , LKML , Daniel Borkmann Subject: RE: [RFC][PATCH 2/7] kref: Add kref_read() Thread-Topic: [RFC][PATCH 2/7] kref: Add kref_read() Thread-Index: AQHSQEVANqFGs6lR9EeBqDMncz0eH6Dc36YAgAB8mgCAAaU8QA== Date: Fri, 18 Nov 2016 17:33:35 +0000 Message-ID: <2236FBA76BA1254E88B949DDB74E612B41C14BB4@IRSMSX102.ger.corp.intel.com> References: <20161117085342.GB3142@twins.programming.kicks-ass.net> <20161117161937.GA46515@ast-mbp.thefacebook.com> In-Reply-To: <20161117161937.GA46515@ast-mbp.thefacebook.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [163.33.239.180] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 8BIT MIME-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1990 Lines: 45 On Thu, Nov 17, 2016 at 09:53:42AM +0100, Peter Zijlstra wrote: > On Wed, Nov 16, 2016 at 12:08:52PM -0800, Alexei Starovoitov wrote: > > > I prefer to avoid 'fixing' things that are not broken. > > Note, prog->aux->refcnt already has explicit checks for overflow. > > locked_vm is used for resource accounting and not refcnt, so I don't > > see issues there either. > > The idea is to use something along the lines of: > > > http://lkml.kernel.org/r/20161115104608.GH3142@twins.programming.kicks > -ass.net > > for all refcounts in the kernel. >I understand the idea. I'm advocating to fix refcnts explicitly the way we did in bpf land instead of leaking memory, making processes unkillable and so on. >If refcnt can be bounds checked, it should be done that way, since it's a clean error path without odd side effects. >Therefore I'm against unconditionally applying refcount to all atomics. > Also note that your: > > struct bpf_prog *bpf_prog_add(struct bpf_prog *prog, int i) { > if (atomic_add_return(i, &prog->aux->refcnt) > BPF_MAX_REFCNT) { > atomic_sub(i, &prog->aux->refcnt); > return ERR_PTR(-EBUSY); > } > return prog; > } > > is actually broken in the face of an actual overflow. Suppose @i is > big enough to wrap refcnt into negative space. >'i' is not controlled by user. It's a number of nic hw queues and BPF_MAX_REFCNT is 32k, so above is always safe. If I understand your code right, you export the bpf_prog_add() and anyone is free to use it (some crazy buggy driver for example). Currently only drivers/net/ethernet/mellanox/mlx4/en_netdev.c uses it, but you should consider any externally exposed interface as an attack vector from security point of view. So, I would not claim that above construction is always safe since there is a way using API to supply "i" that would overflow. Next question is how to convert the above code sanely to refcount_t interface... Loop of inc(s)? Iikk...