Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933009AbcKVKhj (ORCPT ); Tue, 22 Nov 2016 05:37:39 -0500 Received: from merlin.infradead.org ([205.233.59.134]:57822 "EHLO merlin.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932989AbcKVKhh (ORCPT ); Tue, 22 Nov 2016 05:37:37 -0500 Date: Tue, 22 Nov 2016 11:37:21 +0100 From: Peter Zijlstra To: David Windsor Cc: "Reshetova, Elena" , Alexei Starovoitov , Kees Cook , Greg KH , Will Deacon , Arnd Bergmann , Thomas Gleixner , Ingo Molnar , "H. Peter Anvin" , LKML , Daniel Borkmann Subject: Re: [RFC][PATCH 2/7] kref: Add kref_read() Message-ID: <20161122103721.GO3092@twins.programming.kicks-ass.net> References: <20161117085342.GB3142@twins.programming.kicks-ass.net> <20161117161937.GA46515@ast-mbp.thefacebook.com> <2236FBA76BA1254E88B949DDB74E612B41C14BB4@IRSMSX102.ger.corp.intel.com> <2236FBA76BA1254E88B949DDB74E612B41C15583@IRSMSX102.ger.corp.intel.com> <20161121154915.GB3124@twins.programming.kicks-ass.net> <20161121160059.GB3174@twins.programming.kicks-ass.net> <2236FBA76BA1254E88B949DDB74E612B41C156AE@IRSMSX102.ger.corp.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23.1 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2146 Lines: 59 On Mon, Nov 21, 2016 at 03:12:33PM -0500, David Windsor wrote: > On Mon, Nov 21, 2016 at 2:27 PM, Reshetova, Elena > wrote: > >> On Mon, Nov 21, 2016 at 04:49:15PM +0100, Peter Zijlstra wrote: > >> > > Speaking of non-fitting patterns. This one is quite common in > >> > > networking code for refcounters: > >> > > > >> > > if (atomic_cmpxchg(&cur->refcnt, 1, 0) == 1) {} This is from > >> > > net/netfilter/nfnetlink_acct.c, but there are similar ones in other > >> > > places. > >> > > >> > Cute, but weird it doesn't actually decrement if not 1. > >> > >> Hurgh.. creative refcounting that. The question is how much of that do > >> we want to support? It really must not decrement there. Now, arguably the 1->0 case is special, and we can provide limited support for that, but I'd be hesitant to provide the full cmpxchg. We could for instance provide: refcount_dec_if_one(). > > And one more creative usage: > > > > http://lxr.free-electrons.com/source/net/ipv4/udp.c#L1940 > > > > if (!sk || !atomic_inc_not_zero_hint(&sk->sk_refcnt, 2)) > > return; > > > > I didn't even guess anyone is using atomic_inc_not_zero_hint... > > But network code keeps surprising me today :) > > So, yes, I guess the question is what to do with these cases really? > > Many of the calls to non-supported functions can be decomposed into > calls to supported functions. So it really depends on what the network guys are willing to put up with, if their primary goal is to avoid the SHARED state, we could add a load-exclusive. But I suspect they'd not be happy with that either... > The ones that may prove interesting are > ones like atomic_cmpxchg(), in which some sort of external locking is > going to be required to achieve the same atomicity guarantees provided > by cmpxchg, like so: > > mutex_lock(lock); > cnt = refcount_read(ref); > if (cnt == val1) { > refcount_set(ref, val2); > } > mutex_unlock(lock); > return cnt; That cannot actually work in the presence of actual atomic instructions not serialized by that lock. Also, the network guys will absolutely kill you if you propose something like that.