Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp4989494imu; Tue, 8 Jan 2019 09:34:17 -0800 (PST) X-Google-Smtp-Source: ALg8bN5Xm77J0qApliWcXv3RunDiRGWSJR3vL/soJ1AABiNM9vefJYDJ7rR/BlEz5fyF4+UFnML1 X-Received: by 2002:a62:710a:: with SMTP id m10mr2598545pfc.69.1546968857385; Tue, 08 Jan 2019 09:34:17 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1546968857; cv=none; d=google.com; s=arc-20160816; b=pFMno1+E6N1Na3ZtDDMAPLFXBCARy0hS5prVxVXuVsyemnbbWuBNJvXXhGKWe4G4Zn HMlY4QPldur/cfcLKOsP5uBt77NqJlj4C3rH4dv1ggUMe5gq9rWPvkoWIGfX5Q8swv/q JFo7yqkrqYMr6BC0jkvyjB8L56fg22DJspRO5fS0LgLHuZvrPNCzaCW2rmKVa3p3uT6P Y/1ZsEb7XoDKHWM9gjDBy5JDXk+PBgic8QaICcRd/VQcAzbDX01yizIvFXSB7FiVmACu J1YHd/RB2NUiCrmHaZ8J/lBN7lv6ypzwOywtzgn49qi0mpe3957AcO05wjj7zSo0vBpR IRBA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-language :content-transfer-encoding:in-reply-to:mime-version:user-agent:date :message-id:organization:autocrypt:openpgp:references:cc:to:from :subject; bh=72tnTFdCQT+W9ckSkhIwvCvxXhfobMI21y5LhlvhBUk=; b=hAThgX4V9uoCU0l8Z5g4bG8jjwia6Q0CQRl5pudQObc05m8ajAFKx4XdwB0o+1NwBf bh2z9ZQaRYc0/UAzVZzGO90YUHTj0AODHaFctqVP2x1KnrVfS4V8aHUebxdtXvXc/OX4 AK2c0Mq3v5TQjEbm/ns+54hsRVUzWd2lcfbt1OmGomwbJvg/Z22JMvuKqbIGvaI46XXa 2EDnNwO/S+9xXrMvQk/ey/h4EGUvfOmBxjuI5cO8cC/Zssz2ZojbK/ghZbNpNXzsrGDB oRCtOe5sKnnIOaZt7jfaAjYum09+j8C/Ch9p5YNF69VCZ9G74qFaxKkRHXgY+ZWXNoG7 MQag== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id k16si62367857pls.124.2019.01.08.09.33.52; Tue, 08 Jan 2019 09:34:17 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728996AbfAHRcm convert rfc822-to-8bit (ORCPT + 99 others); Tue, 8 Jan 2019 12:32:42 -0500 Received: from mx1.redhat.com ([209.132.183.28]:34808 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728446AbfAHRcm (ORCPT ); Tue, 8 Jan 2019 12:32:42 -0500 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 794D2356D2; Tue, 8 Jan 2019 17:32:41 +0000 (UTC) Received: from llong.remote.csb (dhcp-17-223.bos.redhat.com [10.18.17.223]) by smtp.corp.redhat.com (Postfix) with ESMTP id E0485100164A; Tue, 8 Jan 2019 17:32:39 +0000 (UTC) Subject: Re: [PATCH 0/2] /proc/stat: Reduce irqs counting performance overhead From: Waiman Long To: Michal Hocko , Dave Chinner Cc: Andrew Morton , Alexey Dobriyan , Luis Chamberlain , Kees Cook , Jonathan Corbet , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-fsdevel@vger.kernel.org, Davidlohr Bueso , Miklos Szeredi , Daniel Colascione , Randy Dunlap References: <1546873978-27797-1-git-send-email-longman@redhat.com> <20190107223214.GZ6311@dastard> <9b4208b7-f97b-047c-4dab-15bd3791e7de@redhat.com> <20190108020422.GA27534@dastard> <20190108161100.GE31793@dhcp22.suse.cz> <5525323d-7465-5bfc-862e-a3bcff61fb00@redhat.com> Openpgp: preference=signencrypt Autocrypt: addr=longman@redhat.com; prefer-encrypt=mutual; keydata= xsFNBFgsZGsBEAC3l/RVYISY3M0SznCZOv8aWc/bsAgif1H8h0WPDrHnwt1jfFTB26EzhRea XQKAJiZbjnTotxXq1JVaWxJcNJL7crruYeFdv7WUJqJzFgHnNM/upZuGsDIJHyqBHWK5X9ZO jRyfqV/i3Ll7VIZobcRLbTfEJgyLTAHn2Ipcpt8mRg2cck2sC9+RMi45Epweu7pKjfrF8JUY r71uif2ThpN8vGpn+FKbERFt4hW2dV/3awVckxxHXNrQYIB3I/G6mUdEZ9yrVrAfLw5M3fVU CRnC6fbroC6/ztD40lyTQWbCqGERVEwHFYYoxrcGa8AzMXN9CN7bleHmKZrGxDFWbg4877zX 0YaLRypme4K0ULbnNVRQcSZ9UalTvAzjpyWnlnXCLnFjzhV7qsjozloLTkZjyHimSc3yllH7 VvP/lGHnqUk7xDymgRHNNn0wWPuOpR97J/r7V1mSMZlni/FVTQTRu87aQRYu3nKhcNJ47TGY evz/U0ltaZEU41t7WGBnC7RlxYtdXziEn5fC8b1JfqiP0OJVQfdIMVIbEw1turVouTovUA39 Qqa6Pd1oYTw+Bdm1tkx7di73qB3x4pJoC8ZRfEmPqSpmu42sijWSBUgYJwsziTW2SBi4hRjU h/Tm0NuU1/R1bgv/EzoXjgOM4ZlSu6Pv7ICpELdWSrvkXJIuIwARAQABzR9Mb25nbWFuIExv bmcgPGxsb25nQHJlZGhhdC5jb20+wsF/BBMBAgApBQJYLGRrAhsjBQkJZgGABwsJCAcDAgEG FQgCCQoLBBYCAwECHgECF4AACgkQbjBXZE7vHeYwBA//ZYxi4I/4KVrqc6oodVfwPnOVxvyY oKZGPXZXAa3swtPGmRFc8kGyIMZpVTqGJYGD9ZDezxpWIkVQDnKM9zw/qGarUVKzElGHcuFN ddtwX64yxDhA+3Og8MTy8+8ZucM4oNsbM9Dx171bFnHjWSka8o6qhK5siBAf9WXcPNogUk4S fMNYKxexcUayv750GK5E8RouG0DrjtIMYVJwu+p3X1bRHHDoieVfE1i380YydPd7mXa7FrRl 7unTlrxUyJSiBc83HgKCdFC8+ggmRVisbs+1clMsK++ehz08dmGlbQD8Fv2VK5KR2+QXYLU0 rRQjXk/gJ8wcMasuUcywnj8dqqO3kIS1EfshrfR/xCNSREcv2fwHvfJjprpoE9tiL1qP7Jrq 4tUYazErOEQJcE8Qm3fioh40w8YrGGYEGNA4do/jaHXm1iB9rShXE2jnmy3ttdAh3M8W2OMK 4B/Rlr+Awr2NlVdvEF7iL70kO+aZeOu20Lq6mx4Kvq/WyjZg8g+vYGCExZ7sd8xpncBSl7b3 99AIyT55HaJjrs5F3Rl8dAklaDyzXviwcxs+gSYvRCr6AMzevmfWbAILN9i1ZkfbnqVdpaag QmWlmPuKzqKhJP+OMYSgYnpd/vu5FBbc+eXpuhydKqtUVOWjtp5hAERNnSpD87i1TilshFQm TFxHDzbOwU0EWCxkawEQALAcdzzKsZbcdSi1kgjfce9AMjyxkkZxcGc6Rhwvt78d66qIFK9D Y9wfcZBpuFY/AcKEqjTo4FZ5LCa7/dXNwOXOdB1Jfp54OFUqiYUJFymFKInHQYlmoES9EJEU yy+2ipzy5yGbLh3ZqAXyZCTmUKBU7oz/waN7ynEP0S0DqdWgJnpEiFjFN4/ovf9uveUnjzB6 lzd0BDckLU4dL7aqe2ROIHyG3zaBMuPo66pN3njEr7IcyAL6aK/IyRrwLXoxLMQW7YQmFPSw drATP3WO0x8UGaXlGMVcaeUBMJlqTyN4Swr2BbqBcEGAMPjFCm6MjAPv68h5hEoB9zvIg+fq M1/Gs4D8H8kUjOEOYtmVQ5RZQschPJle95BzNwE3Y48ZH5zewgU7ByVJKSgJ9HDhwX8Ryuia 79r86qZeFjXOUXZjjWdFDKl5vaiRbNWCpuSG1R1Tm8o/rd2NZ6l8LgcK9UcpWorrPknbE/pm MUeZ2d3ss5G5Vbb0bYVFRtYQiCCfHAQHO6uNtA9IztkuMpMRQDUiDoApHwYUY5Dqasu4ZDJk bZ8lC6qc2NXauOWMDw43z9He7k6LnYm/evcD+0+YebxNsorEiWDgIW8Q/E+h6RMS9kW3Rv1N qd2nFfiC8+p9I/KLcbV33tMhF1+dOgyiL4bcYeR351pnyXBPA66ldNWvABEBAAHCwWUEGAEC AA8FAlgsZGsCGwwFCQlmAYAACgkQbjBXZE7vHeYxSQ/+PnnPrOkKHDHQew8Pq9w2RAOO8gMg 9Ty4L54CsTf21Mqc6GXj6LN3WbQta7CVA0bKeq0+WnmsZ9jkTNh8lJp0/RnZkSUsDT9Tza9r GB0svZnBJMFJgSMfmwa3cBttCh+vqDV3ZIVSG54nPmGfUQMFPlDHccjWIvTvyY3a9SLeamaR jOGye8MQAlAD40fTWK2no6L1b8abGtziTkNh68zfu3wjQkXk4kA4zHroE61PpS3oMD4AyI9L 7A4Zv0Cvs2MhYQ4Qbbmafr+NOhzuunm5CoaRi+762+c508TqgRqH8W1htZCzab0pXHRfywtv 0P+BMT7vN2uMBdhr8c0b/hoGqBTenOmFt71tAyyGcPgI3f7DUxy+cv3GzenWjrvf3uFpxYx4 yFQkUcu06wa61nCdxXU/BWFItryAGGdh2fFXnIYP8NZfdA+zmpymJXDQeMsAEHS0BLTVQ3+M 7W5Ak8p9V+bFMtteBgoM23bskH6mgOAw6Cj/USW4cAJ8b++9zE0/4Bv4iaY5bcsL+h7TqQBH Lk1eByJeVooUa/mqa2UdVJalc8B9NrAnLiyRsg72Nurwzvknv7anSgIkL+doXDaG21DgCYTD wGA5uquIgb8p3/ENgYpDPrsZ72CxVC2NEJjJwwnRBStjJOGQX4lV1uhN1XsZjBbRHdKF2W9g weim8xU= Organization: Red Hat Message-ID: <82f8d56e-013a-03c5-2fe0-c928e2febb96@redhat.com> Date: Tue, 8 Jan 2019 12:32:39 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: <5525323d-7465-5bfc-862e-a3bcff61fb00@redhat.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8BIT Content-Language: en-US X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.30]); Tue, 08 Jan 2019 17:32:41 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 01/08/2019 12:05 PM, Waiman Long wrote: > On 01/08/2019 11:11 AM, Michal Hocko wrote: >> On Tue 08-01-19 13:04:22, Dave Chinner wrote: >>> On Mon, Jan 07, 2019 at 05:41:39PM -0500, Waiman Long wrote: >>>> On 01/07/2019 05:32 PM, Dave Chinner wrote: >>>>> On Mon, Jan 07, 2019 at 10:12:56AM -0500, Waiman Long wrote: >>>>>> As newer systems have more and more IRQs and CPUs available in their >>>>>> system, the performance of reading /proc/stat frequently is getting >>>>>> worse and worse. >>>>> Because the "roll-your-own" per-cpu counter implementaiton has been >>>>> optimised for low possible addition overhead on the premise that >>>>> summing the counters is rare and isn't a performance issue. This >>>>> patchset is a direct indication that this "summing is rare and can >>>>> be slow" premise is now invalid. >>>>> >>>>> We have percpu counter infrastructure that trades off a small amount >>>>> of addition overhead for zero-cost reading of the counter value. >>>>> i.e. why not just convert this whole mess to percpu_counters and >>>>> then just use percpu_counter_read_positive()? Then we just don't >>>>> care how often userspace reads the /proc file because there is no >>>>> summing involved at all... >>>>> >>>>> Cheers, >>>>> >>>>> Dave. >>>> Yes, percpu_counter_read_positive() is cheap. However, you still need to >>>> pay the price somewhere. In the case of percpu_counter, the update is >>>> more expensive. >>> Ummm, that's exactly what I just said. It's a percpu counter that >>> solves the "sum is expensive and frequent" problem, just like you >>> are encountering here. I do not need basic scalability algorithms >>> explained to me. >>> >>>> I would say the percentage of applications that will hit this problem is >>>> small. But for them, this problem has some significant performance overhead. >>> Well, duh! >>> >>> What I was suggesting is that you change the per-cpu counter >>> implementation to the /generic infrastructure/ that solves this >>> problem, and then determine if the extra update overhead is at all >>> measurable. If you can't measure any difference in update overhead, >>> then slapping complexity on the existing counter to attempt to >>> mitigate the summing overhead is the wrong solution. >>> >>> Indeed, it may be that you need o use a custom batch scaling curve >>> for the generic per-cpu coutner infrastructure to mitigate the >>> update overhead, but the fact is we already have generic >>> infrastructure that solves your problem and so the solution should >>> be "use the generic infrastructure" until it can be proven not to >>> work. >>> >>> i.e. prove the generic infrastructure is not fit for purpose and >>> cannot be improved sufficiently to work for this use case before >>> implementing a complex, one-off snowflake counter implementation... >> Completely agreed! Apart from that I find that conversion to a generic >> infrastructure worth even if that doesn't solve the problem at hands >> completely. If for no other reasons then the sheer code removal as kstat >> is not really used for anything apart from this accounting AFAIR. The >> less ad-hoc code we have the better IMHO. Another point that I want to make is that I don't see the kstat code will ever going to be removed unless we scrap the whole /proc/stat file. IRQ counts reporting is a performance problem simply because of the large number of them (in thousands). The other percpu counts are currently fine as they will only go up to hundreds at most. Cheers, Longman >> And to the underlying problem. Some proc files do not scale on large >> machines. Maybe it is time to explain that to application writers that >> if they are collecting data too agressively then it won't scale. We can >> only do this much. Lying about numbers by hiding updates is, well, >> lying and won't solve the underlying problem. > I would not say it is lying. As I said in the changelog, reading > /proc/stat infrequently will give the right counts. Only when it is read > frequently that the data may not be up-to-date. Using > percpu_counter_sum_positive() as suggested by Dave will guarantee that > the counts will likely be off by a certain amount too. So it is also a > trade-off between accuracy and performance. > > Cheers, > Longman > >