Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp4939963imu; Tue, 8 Jan 2019 08:47:28 -0800 (PST) X-Google-Smtp-Source: ALg8bN4x+oqWDwstPTdpa8Wcy339uNDEK1Zaa/+7eJKKCnIBn+TW1NmRNMNjoYwnEQqfIjOByofc X-Received: by 2002:a62:53c5:: with SMTP id h188mr2407972pfb.190.1546966048833; Tue, 08 Jan 2019 08:47:28 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1546966048; cv=none; d=google.com; s=arc-20160816; b=jNlEtM6wDPO8exzW1DfAzxOj7G1cKVJD3f/pi0WcRLJh52ckm6qP6IKc9d6sgXKftE aR7R+8emr781fINKAe71rk24Sgt3kxso49HNGU46hwBHm8cIcLX063aWqtsVko54N3ax 0BumRVULHhrlx9lGxe7nxKG+wMl1unbfqm6lrsoOjrYr7yPTljIyjlSQSuiJC9eIbrjg eS3M0M6Zdv8/ZnsVOdezVl9owZb9ELy1wEFZ1b1wWpAPOUrJ3nZZc0Rp2TPBJAqLIiE5 ODBQhup4Zg2Lz1E9qnK++Ke7ooeLTgDkM01+oI0ILk9rfNouMI/0Abp7KIt8HgdvLdah +OHg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=gjQguv0WNmjRegV23E15C2GaFYHCkBWKYb8Ivqju7PU=; b=ThB/Dg+ZFx57NQHLdyq9hlQq1vUlK/IMT/7cUlflRC4h3BAhD2N4DncZp8ashpiNVU +jOoe8PosRs+syvl4uUcRvFfuDw10xymcaumBuDYjI1zHB7wyQhaByIUOHib0RT5j5ft mZOKSnt9NTehwCfUH5g8xPtPqclTQttUy7epwlatc2tcd9+DGmwh4psdv3LgfTpWStGc U9voDFQIpkazYkmqCIkv85pkVE2cAvx4hWLw6EwQoxT3TaAu3rBL7nxMo/j6fIAzNj8c skYxZbp0R52lA3BhdRKsjWBp23cwHJCVpy04du5QfpJyAZAxXXMQXFoXJV4feIJ/jKYX SObQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id m3si11433141plt.394.2019.01.08.08.47.13; Tue, 08 Jan 2019 08:47:28 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729401AbfAHQLG (ORCPT + 99 others); Tue, 8 Jan 2019 11:11:06 -0500 Received: from mx2.suse.de ([195.135.220.15]:41602 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1729077AbfAHQLE (ORCPT ); Tue, 8 Jan 2019 11:11:04 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 325D8B0A3; Tue, 8 Jan 2019 16:11:02 +0000 (UTC) Date: Tue, 8 Jan 2019 17:11:00 +0100 From: Michal Hocko To: Dave Chinner Cc: Waiman Long , Andrew Morton , Alexey Dobriyan , Luis Chamberlain , Kees Cook , Jonathan Corbet , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-fsdevel@vger.kernel.org, Davidlohr Bueso , Miklos Szeredi , Daniel Colascione , Randy Dunlap Subject: Re: [PATCH 0/2] /proc/stat: Reduce irqs counting performance overhead Message-ID: <20190108161100.GE31793@dhcp22.suse.cz> References: <1546873978-27797-1-git-send-email-longman@redhat.com> <20190107223214.GZ6311@dastard> <9b4208b7-f97b-047c-4dab-15bd3791e7de@redhat.com> <20190108020422.GA27534@dastard> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190108020422.GA27534@dastard> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue 08-01-19 13:04:22, Dave Chinner wrote: > On Mon, Jan 07, 2019 at 05:41:39PM -0500, Waiman Long wrote: > > On 01/07/2019 05:32 PM, Dave Chinner wrote: > > > On Mon, Jan 07, 2019 at 10:12:56AM -0500, Waiman Long wrote: > > >> As newer systems have more and more IRQs and CPUs available in their > > >> system, the performance of reading /proc/stat frequently is getting > > >> worse and worse. > > > Because the "roll-your-own" per-cpu counter implementaiton has been > > > optimised for low possible addition overhead on the premise that > > > summing the counters is rare and isn't a performance issue. This > > > patchset is a direct indication that this "summing is rare and can > > > be slow" premise is now invalid. > > > > > > We have percpu counter infrastructure that trades off a small amount > > > of addition overhead for zero-cost reading of the counter value. > > > i.e. why not just convert this whole mess to percpu_counters and > > > then just use percpu_counter_read_positive()? Then we just don't > > > care how often userspace reads the /proc file because there is no > > > summing involved at all... > > > > > > Cheers, > > > > > > Dave. > > > > Yes, percpu_counter_read_positive() is cheap. However, you still need to > > pay the price somewhere. In the case of percpu_counter, the update is > > more expensive. > > Ummm, that's exactly what I just said. It's a percpu counter that > solves the "sum is expensive and frequent" problem, just like you > are encountering here. I do not need basic scalability algorithms > explained to me. > > > I would say the percentage of applications that will hit this problem is > > small. But for them, this problem has some significant performance overhead. > > Well, duh! > > What I was suggesting is that you change the per-cpu counter > implementation to the /generic infrastructure/ that solves this > problem, and then determine if the extra update overhead is at all > measurable. If you can't measure any difference in update overhead, > then slapping complexity on the existing counter to attempt to > mitigate the summing overhead is the wrong solution. > > Indeed, it may be that you need o use a custom batch scaling curve > for the generic per-cpu coutner infrastructure to mitigate the > update overhead, but the fact is we already have generic > infrastructure that solves your problem and so the solution should > be "use the generic infrastructure" until it can be proven not to > work. > > i.e. prove the generic infrastructure is not fit for purpose and > cannot be improved sufficiently to work for this use case before > implementing a complex, one-off snowflake counter implementation... Completely agreed! Apart from that I find that conversion to a generic infrastructure worth even if that doesn't solve the problem at hands completely. If for no other reasons then the sheer code removal as kstat is not really used for anything apart from this accounting AFAIR. The less ad-hoc code we have the better IMHO. And to the underlying problem. Some proc files do not scale on large machines. Maybe it is time to explain that to application writers that if they are collecting data too agressively then it won't scale. We can only do this much. Lying about numbers by hiding updates is, well, lying and won't solve the underlying problem. -- Michal Hocko SUSE Labs