Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752694AbcKSDrg (ORCPT ); Fri, 18 Nov 2016 22:47:36 -0500 Received: from mail-pg0-f66.google.com ([74.125.83.66]:35996 "EHLO mail-pg0-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752234AbcKSDrf (ORCPT ); Fri, 18 Nov 2016 22:47:35 -0500 Date: Fri, 18 Nov 2016 19:47:29 -0800 From: Alexei Starovoitov To: "Reshetova, Elena" Cc: Peter Zijlstra , Kees Cook , Greg KH , Will Deacon , Arnd Bergmann , Thomas Gleixner , Ingo Molnar , "H. Peter Anvin" , David Windsor , LKML , Daniel Borkmann Subject: Re: [RFC][PATCH 2/7] kref: Add kref_read() Message-ID: <20161119034727.GA24370@ast-mbp.thefacebook.com> References: <20161117085342.GB3142@twins.programming.kicks-ass.net> <20161117161937.GA46515@ast-mbp.thefacebook.com> <2236FBA76BA1254E88B949DDB74E612B41C14BB4@IRSMSX102.ger.corp.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <2236FBA76BA1254E88B949DDB74E612B41C14BB4@IRSMSX102.ger.corp.intel.com> User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2147 Lines: 45 On Fri, Nov 18, 2016 at 05:33:35PM +0000, Reshetova, Elena wrote: > On Thu, Nov 17, 2016 at 09:53:42AM +0100, Peter Zijlstra wrote: > > On Wed, Nov 16, 2016 at 12:08:52PM -0800, Alexei Starovoitov wrote: > > > > > I prefer to avoid 'fixing' things that are not broken. > > > Note, prog->aux->refcnt already has explicit checks for overflow. > > > locked_vm is used for resource accounting and not refcnt, so I don't > > > see issues there either. > > > > The idea is to use something along the lines of: > > > > > > http://lkml.kernel.org/r/20161115104608.GH3142@twins.programming.kicks > > -ass.net > > > > for all refcounts in the kernel. > > >I understand the idea. I'm advocating to fix refcnts explicitly the way we did in bpf land instead of leaking memory, making processes unkillable and so on. > >If refcnt can be bounds checked, it should be done that way, since it's a clean error path without odd side effects. > >Therefore I'm against unconditionally applying refcount to all atomics. > > > Also note that your: > > > > struct bpf_prog *bpf_prog_add(struct bpf_prog *prog, int i) { > > if (atomic_add_return(i, &prog->aux->refcnt) > BPF_MAX_REFCNT) { > > atomic_sub(i, &prog->aux->refcnt); > > return ERR_PTR(-EBUSY); > > } > > return prog; > > } > > > > is actually broken in the face of an actual overflow. Suppose @i is > > big enough to wrap refcnt into negative space. > > >'i' is not controlled by user. It's a number of nic hw queues and BPF_MAX_REFCNT is 32k, so above is always safe. > > If I understand your code right, you export the bpf_prog_add() and anyone is free to use it > (some crazy buggy driver for example). > Currently only drivers/net/ethernet/mellanox/mlx4/en_netdev.c uses it, but you should > consider any externally exposed interface as an attack vector from security point of view. It's not realistic to harden all export_symbol apis. Code review for in-tree modules is the only defense we have. Remember out of tree perf counter issues... nothing perf core can do about that. If it's out of tree, it's vendor's problem to fix it.