Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754723AbcKPKMX (ORCPT ); Wed, 16 Nov 2016 05:12:23 -0500 Received: from www62.your-server.de ([213.133.104.62]:57566 "EHLO www62.your-server.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752835AbcKPKMV (ORCPT ); Wed, 16 Nov 2016 05:12:21 -0500 Message-ID: <582C30DF.60205@iogearbox.net> Date: Wed, 16 Nov 2016 11:11:43 +0100 From: Daniel Borkmann User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 MIME-Version: 1.0 To: Greg KH , Kees Cook CC: Peter Zijlstra , Will Deacon , "Reshetova, Elena" , Arnd Bergmann , Thomas Gleixner , Ingo Molnar , "H. Peter Anvin" , David Windsor , LKML , Alexei Starovoitov Subject: Re: [RFC][PATCH 2/7] kref: Add kref_read() References: <20161114173946.501528675@infradead.org> <20161114174446.486581399@infradead.org> <20161115073322.GC28248@kroah.com> <20161115080314.GD3142@twins.programming.kicks-ass.net> <20161116082151.GA24017@kroah.com> In-Reply-To: <20161116082151.GA24017@kroah.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-Authenticated-Sender: daniel@iogearbox.net Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3573 Lines: 90 On 11/16/2016 09:21 AM, Greg KH wrote: > On Tue, Nov 15, 2016 at 12:53:35PM -0800, Kees Cook wrote: >> On Tue, Nov 15, 2016 at 12:03 AM, Peter Zijlstra wrote: >>> On Tue, Nov 15, 2016 at 08:33:22AM +0100, Greg KH wrote: >>>> On Mon, Nov 14, 2016 at 06:39:48PM +0100, Peter Zijlstra wrote: >>> >>>>> --- a/drivers/block/drbd/drbd_req.c >>>>> +++ b/drivers/block/drbd/drbd_req.c >>>>> @@ -520,7 +520,7 @@ static void mod_rq_state(struct drbd_req >>>>> /* Completion does it's own kref_put. If we are going to >>>>> * kref_sub below, we need req to be still around then. */ >>>>> int at_least = k_put + !!c_put; >>>>> - int refcount = atomic_read(&req->kref.refcount); >>>>> + int refcount = kref_read(&req->kref); >>>>> if (refcount < at_least) >>>>> drbd_err(device, >>>>> "mod_rq_state: Logic BUG: %x -> %x: refcount = %d, should be >= %d\n", >>>> >>>> As proof of "things you should never do", here is one such example. >>>> >>>> ugh. >>>> >>>>> --- a/drivers/block/virtio_blk.c >>>>> +++ b/drivers/block/virtio_blk.c >>>>> @@ -767,7 +767,7 @@ static void virtblk_remove(struct virtio >>>>> /* Stop all the virtqueues. */ >>>>> vdev->config->reset(vdev); >>>>> >>>>> - refc = atomic_read(&disk_to_dev(vblk->disk)->kobj.kref.refcount); >>>>> + refc = kref_read(&disk_to_dev(vblk->disk)->kobj.kref); >>>>> put_disk(vblk->disk); >>>>> vdev->config->del_vqs(vdev); >>>>> kfree(vblk->vqs); >>>> >>>> And this too, ugh, that's a huge abuse and is probably totally wrong... >>>> >>>> thanks again for digging through this crap. I wonder if we need to name >>>> the kref reference variable "do_not_touch_this_ever" or some such thing >>>> to catch all of the people who try to be "too smart". >>> >>> There's unimaginable bong hits involved in this stuff, in the end I >>> resorted to brute force and scripts to convert all this. >> >> What should we do about things like this (bpf_prog_put() and callbacks >> from kernel/bpf/syscall.c): Just reading up on this series. Your question refers to converting bpf prog and map ref counts to Peter's refcount_t eventually, right? >> static void bpf_prog_uncharge_memlock(struct bpf_prog *prog) >> { >> struct user_struct *user = prog->aux->user; >> >> atomic_long_sub(prog->pages, &user->locked_vm); > > Oh that's scary. Let's just make one reference count rely on another > one and not check things... Sorry, could you elaborate what you mean by 'check things', you mean for wrap around? IIUC, back then accounting was roughly similar modeled after perf event's one, and in this case accounts for pages used by progs and maps during their life-time. Are you suggesting that this approach is inherently broken? >> free_uid(user); >> } >> >> static void __bpf_prog_put_rcu(struct rcu_head *rcu) >> { >> struct bpf_prog_aux *aux = container_of(rcu, struct bpf_prog_aux, rcu); >> >> free_used_maps(aux); >> bpf_prog_uncharge_memlock(aux->prog); >> bpf_prog_free(aux->prog); >> } >> >> void bpf_prog_put(struct bpf_prog *prog) >> { >> if (atomic_dec_and_test(&prog->aux->refcnt)) >> call_rcu(&prog->aux->rcu, __bpf_prog_put_rcu); >> } >> >> >> Not only do we want to protect prog->aux->refcnt, but I think we want >> to protect user->locked_vm too ... I don't think it's sane for >> user->locked_vm to be a stats_t ? > > I don't think this is sane code... > > greg k-h