Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753386AbcCGSYj (ORCPT ); Mon, 7 Mar 2016 13:24:39 -0500 Received: from resqmta-ch2-08v.sys.comcast.net ([69.252.207.40]:51720 "EHLO resqmta-ch2-08v.sys.comcast.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752360AbcCGSYd (ORCPT ); Mon, 7 Mar 2016 13:24:33 -0500 Date: Mon, 7 Mar 2016 12:24:31 -0600 (CST) From: Christoph Lameter X-X-Sender: cl@east.gentwo.org To: Waiman Long cc: Tejun Heo , Dave Chinner , xfs@oss.sgi.com, linux-kernel@vger.kernel.org, Ingo Molnar , Peter Zijlstra , Scott J Norton , Douglas Hatch Subject: Re: [RFC PATCH 1/2] percpu_counter: Allow falling back to global counter on large system In-Reply-To: <1457146299-1601-2-git-send-email-Waiman.Long@hpe.com> Message-ID: References: <1457146299-1601-1-git-send-email-Waiman.Long@hpe.com> <1457146299-1601-2-git-send-email-Waiman.Long@hpe.com> Content-Type: text/plain; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1441 Lines: 30 On Fri, 4 Mar 2016, Waiman Long wrote: > This patch provides a mechanism to selectively degenerate per-cpu > counters to global counters at per-cpu counter initialization time. The > following new API is added: > > percpu_counter_set_limit(struct percpu_counter *fbc, > u32 percpu_limit) > > The function should be called after percpu_counter_set(). It will > compare the total limit (nr_cpu * percpu_limit) against the current > counter value. If the limit is not smaller, it will disable per-cpu > counter and use only the global counter instead. At run time, when > the counter value grows past the total limit, per-cpu counter will > be enabled again. Hmmm... That is requiring manual setting of a limit. Would it not be possible to completely automatize the switch over? F.e. one could keep a cpumask of processors that use the per cpu counters. Then in the fastpath if the current cpu is a member increment the per cpu counter. If not do the spinlock thing. If there is contention add the cpu to the cpumask and use the per cpu counters. Thus automatically scaling for the processors on which frequent increments are operating. Then regularly (once per minute or so) degenerate the counter by folding the per cpu diffs into the global count and zapping the cpumask. If the cpumask is empty you can use the global count. Otherwise you just need to add up the counters of the cpus set in the cpumask.