Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754089AbdCTPZn (ORCPT ); Mon, 20 Mar 2017 11:25:43 -0400 Received: from mail-pf0-f194.google.com ([209.85.192.194]:34458 "EHLO mail-pf0-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753574AbdCTPZU (ORCPT ); Mon, 20 Mar 2017 11:25:20 -0400 Message-ID: <1490021461.16816.52.camel@edumazet-glaptop3.roam.corp.google.com> Subject: Re: [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t From: Eric Dumazet To: Peter Zijlstra Cc: Herbert Xu , David Miller , elena.reshetova@intel.com, keescook@chromium.org, netdev@vger.kernel.org, bridge@lists.linux-foundation.org, linux-kernel@vger.kernel.org, kuznet@ms2.inr.ac.ru, jmorris@namei.org, kaber@trash.net, stephen@networkplumber.org, ishkamiel@gmail.com, dwindsor@gmail.com, akpm@linux-foundation.org Date: Mon, 20 Mar 2017 07:51:01 -0700 In-Reply-To: <20170320134017.h3c2jrsnd4guuyu7@hirez.programming.kicks-ass.net> References: <1489767196.28631.305.camel@edumazet-glaptop3.roam.corp.google.com> <20170318164759.GA23837@gondor.apana.org.au> <20170318.182121.439615057765380575.davem@davemloft.net> <20170320103937.lq7nfnutupr3gkn7@hirez.programming.kicks-ass.net> <20170320131629.GA26405@gondor.apana.org.au> <20170320132357.acygo3umw6fiwb4p@hirez.programming.kicks-ass.net> <20170320132713.GA26954@gondor.apana.org.au> <20170320134017.h3c2jrsnd4guuyu7@hirez.programming.kicks-ass.net> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.10.4-0ubuntu2 Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1432 Lines: 38 On Mon, 2017-03-20 at 14:40 +0100, Peter Zijlstra wrote: > On Mon, Mar 20, 2017 at 09:27:13PM +0800, Herbert Xu wrote: > > On Mon, Mar 20, 2017 at 02:23:57PM +0100, Peter Zijlstra wrote: > > > > > > So what bench/setup do you want ran? > > > > You can start by counting how many cycles an atomic op takes > > vs. how many cycles this new code takes. > > On what uarch? > > I think I tested hand coded asm version and it ended up about double the > cycles for a cmpxchg loop vs the direct instruction on an IVB-EX (until > the memory bus saturated, at which point they took the same). Newer > parts will of course have different numbers, > > Can't we run some iperf on a 40gbe fiber loop or something? It would be > very useful to have an actual workload we can run. If atomic ops are converted one by one, it is likely that results will be noise. We can not start a global conversion without having a way to have selective debugging ? Then, adopting this fine infra would really not be a problem. Some arches have efficient atomic_inc() ( no full barriers ) while load + test + atomic_cmpxchg() + test + loop" is more expensive. PowerPC has no efficient atomic_inc() and this definitely shows on network intensive workloads involving concurrent cores/threads. atomic_cmpxchg() on PowerPC is horribly more expensive because of the added two SYNC instructions. networking performance is quite poor on PowerPC as of today.