From: "Reshetova, Elena" <elena.reshetova@intel.com>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>,
        Peter Zijlstra <peterz@infradead.org>
CC: Kees Cook <keescook@chromium.org>,
        Greg KH <gregkh@linuxfoundation.org>,
        Will Deacon <will.deacon@arm.com>, Arnd Bergmann <arnd@arndb.de>,
        "Thomas Gleixner" <tglx@linutronix.de>, Ingo Molnar <mingo@kernel.org>,
        "H. Peter Anvin" <hpa@zytor.com>, David Windsor <dave@progbits.org>,
        LKML <linux-kernel@vger.kernel.org>,
        Daniel Borkmann <daniel@iogearbox.net>
Subject: RE: [RFC][PATCH 2/7] kref: Add kref_read()
Thread-Topic: [RFC][PATCH 2/7] kref: Add kref_read()
Thread-Index: AQHSQEVANqFGs6lR9EeBqDMncz0eH6Dc36YAgAB8mgCAAaU8QA==
Date: Fri, 18 Nov 2016 17:33:35 +0000
Message-ID: <2236FBA76BA1254E88B949DDB74E612B41C14BB4@IRSMSX102.ger.corp.intel.com>
References: <CAADnVQJ4mQJ8XCq-fq2hRMchK9Q7zQNET4RqG3LXTcE2TU=k7Q@mail.gmail.com>
 <20161117085342.GB3142@twins.programming.kicks-ass.net>
 <20161117161937.GA46515@ast-mbp.thefacebook.com>
In-Reply-To: <20161117161937.GA46515@ast-mbp.thefacebook.com>
Accept-Language: en-US
Content-Language: en-US
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 8BIT
MIME-Version: 1.0
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 1990
Lines: 45

On Thu, Nov 17, 2016 at 09:53:42AM +0100, Peter Zijlstra wrote:
> On Wed, Nov 16, 2016 at 12:08:52PM -0800, Alexei Starovoitov wrote:
> 
> > I prefer to avoid 'fixing' things that are not broken.
> > Note, prog->aux->refcnt already has explicit checks for overflow.
> > locked_vm is used for resource accounting and not refcnt, so I don't 
> > see issues there either.
> 
> The idea is to use something along the lines of:
> 
>   
> http://lkml.kernel.org/r/20161115104608.GH3142@twins.programming.kicks
> -ass.net
> 
> for all refcounts in the kernel.

>I understand the idea. I'm advocating to fix refcnts explicitly the way we did in bpf land instead of leaking memory, making processes unkillable and so on.
>If refcnt can be bounds checked, it should be done that way, since it's a clean error path without odd side effects.
>Therefore I'm against unconditionally applying refcount to all atomics.

> Also note that your:
> 
> struct bpf_prog *bpf_prog_add(struct bpf_prog *prog, int i) {
>         if (atomic_add_return(i, &prog->aux->refcnt) > BPF_MAX_REFCNT) {
>                 atomic_sub(i, &prog->aux->refcnt);
>                 return ERR_PTR(-EBUSY);
>         }
>         return prog;
> }
> 
> is actually broken in the face of an actual overflow. Suppose @i is 
> big enough to wrap refcnt into negative space.

>'i' is not controlled by user. It's a number of nic hw queues and BPF_MAX_REFCNT is 32k, so above is always safe.

If I understand your code right, you export the bpf_prog_add() and anyone is free to use it 
(some crazy buggy driver for example).
Currently only drivers/net/ethernet/mellanox/mlx4/en_netdev.c uses it, but you should
consider any externally exposed interface as an attack vector from security point of view. 
So, I would not claim that above construction is always safe since there is a way using API to
supply "i" that would overflow. 

Next question is how to convert the above code sanely to refcount_t interface... Loop of inc(s)? Iikk...