Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933872AbdCUUtE (ORCPT ); Tue, 21 Mar 2017 16:49:04 -0400 Received: from mail-it0-f46.google.com ([209.85.214.46]:36002 "EHLO mail-it0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756924AbdCUUtD (ORCPT ); Tue, 21 Mar 2017 16:49:03 -0400 MIME-Version: 1.0 In-Reply-To: <20170320134017.h3c2jrsnd4guuyu7@hirez.programming.kicks-ass.net> References: <1489767196.28631.305.camel@edumazet-glaptop3.roam.corp.google.com> <20170318164759.GA23837@gondor.apana.org.au> <20170318.182121.439615057765380575.davem@davemloft.net> <20170320103937.lq7nfnutupr3gkn7@hirez.programming.kicks-ass.net> <20170320131629.GA26405@gondor.apana.org.au> <20170320132357.acygo3umw6fiwb4p@hirez.programming.kicks-ass.net> <20170320132713.GA26954@gondor.apana.org.au> <20170320134017.h3c2jrsnd4guuyu7@hirez.programming.kicks-ass.net> From: Kees Cook Date: Tue, 21 Mar 2017 13:49:00 -0700 X-Google-Sender-Auth: LAwrKA52AehlqnwvHyIxx0MGqDo Message-ID: Subject: Re: [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t To: Peter Zijlstra Cc: Herbert Xu , David Miller , Eric Dumazet , "Reshetova, Elena" , Network Development , bridge@lists.linux-foundation.org, LKML , Alexey Kuznetsov , James Morris , Patrick McHardy , Stephen Hemminger , Hans Liljestrand , David Windsor , Andrew Morton Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1156 Lines: 32 On Mon, Mar 20, 2017 at 6:40 AM, Peter Zijlstra wrote: > On Mon, Mar 20, 2017 at 09:27:13PM +0800, Herbert Xu wrote: >> On Mon, Mar 20, 2017 at 02:23:57PM +0100, Peter Zijlstra wrote: >> > >> > So what bench/setup do you want ran? >> >> You can start by counting how many cycles an atomic op takes >> vs. how many cycles this new code takes. > > On what uarch? > > I think I tested hand coded asm version and it ended up about double the > cycles for a cmpxchg loop vs the direct instruction on an IVB-EX (until > the memory bus saturated, at which point they took the same). Newer > parts will of course have different numbers, > > Can't we run some iperf on a 40gbe fiber loop or something? It would be > very useful to have an actual workload we can run. Yeah, this is exactly what I'd like to find as well. Just comparing cycles between refcount implementations, while interesting, doesn't show us real-world performance changes, which is what we need to measure. Is Eric's "20 concurrent 'netperf -t UDP_STREAM'" example (from elsewhere in this email thread) real-world meaningful enough? -Kees -- Kees Cook Pixel Security