Received: by 10.192.165.148 with SMTP id m20csp107654imm; Thu, 19 Apr 2018 14:00:22 -0700 (PDT) X-Google-Smtp-Source: AIpwx4+/oafrtyS/XLXw3fiCS79tVxnSn/+LRPEjrEqa/T3PqnAjY8lzigEMl9ZIP3vbrfRoqZ9J X-Received: by 10.99.178.3 with SMTP id x3mr6264453pge.266.1524171621955; Thu, 19 Apr 2018 14:00:21 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1524171621; cv=none; d=google.com; s=arc-20160816; b=IKJPP5Xhb3Z5yarbv20/pBK6Ru+0q81TJ72deNMc0v7/Yexzs7GW54oMZSKYtW3Tvr BfOCmKCdOxUo+nLoJ641zykmgdjs3sKNIQfserW6zma8Gbb5n0iekAzWeEYFDgAJHMZq jU7hdtyD9TymIdiwg1L8FbDpOl1Urza/1vah3bB579XKNUE/zCoVN0RhrsZfxFPXOC4/ P3C5r80kXh5B0t0MFjqmWqhDf5WoYfj+zXoifzbCLtCinxRWrtEgpngc3Ixs3/j9f9w8 bVtFEd3Z/5xq+1QzWfRdzKNNUMGlhmm4CLd/8ekXtbtXVt6TIDFqgi59qKuV73+OWpct W6Bg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-language :content-transfer-encoding:in-reply-to:mime-version:user-agent:date :message-id:organization:from:references:cc:to:subject :arc-authentication-results; bh=OeUBbSYpPoQVkyf0G1Lj07Ivt+vAjqO0Gqt0d9ktfI4=; b=N/R/tyBogWPcUvYaZNKuFYJsHTQw9CEMTEghHFgK+qhTdY/h3qoOPSshzL5v5kGOZB uRFh9UjYtUzhxVf2N0ehZB+C9oHIdA2f1ftNWe6tAH9DrPHnweNBhcc/ejfKsWh+rEu/ /bo5HIrcNUOJj1R2oxRUYISOIENYlZsqf03owTydnsdSVqEXAET5C7UlTbeAAfLPwWyH pulq4DUOGPp7EnkrXVn5UsQzYIfTwDzvCiZBjLYOXO19nz2EyvwD8OIUXdFf/Ml+GcGI ud3Dn82rVR0MKnwJ65KjLZlbUyv2wUxwKfAzR1t+GErJeE+POYj8iPjpIamtHRul4geN M/Yw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id n86si3916368pfj.86.2018.04.19.14.00.06; Thu, 19 Apr 2018 14:00:21 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753470AbeDSU6z convert rfc822-to-8bit (ORCPT + 99 others); Thu, 19 Apr 2018 16:58:55 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:51662 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1753011AbeDSU6y (ORCPT ); Thu, 19 Apr 2018 16:58:54 -0400 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id BA2F0EB6E5; Thu, 19 Apr 2018 20:58:53 +0000 (UTC) Received: from llong.remote.csb (ovpn-126-106.rdu2.redhat.com [10.10.126.106]) by smtp.corp.redhat.com (Postfix) with ESMTP id 822549457E; Thu, 19 Apr 2018 20:58:53 +0000 (UTC) Subject: Re: [PATCH] proc/stat: Separate out individual irq counts into /proc/stat_irqs To: Alexey Dobriyan Cc: linux-kernel@vger.kernel.org, rdunlap@infradead.org, akpm@linux-foundation.org References: <20180419190846.GE2066@avx2> <1c3b9cf3-3a36-568f-3da2-e560a721f4aa@redhat.com> <20180419195504.GA4343@avx2> <20180419203949.GA4555@avx2> From: Waiman Long Organization: Red Hat Message-ID: Date: Thu, 19 Apr 2018 16:58:53 -0400 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.2.0 MIME-Version: 1.0 In-Reply-To: <20180419203949.GA4555@avx2> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8BIT Content-Language: en-US X-Scanned-By: MIMEDefang 2.79 on 10.11.54.5 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.1]); Thu, 19 Apr 2018 20:58:53 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.1]); Thu, 19 Apr 2018 20:58:53 +0000 (UTC) for IP:'10.11.54.5' DOMAIN:'int-mx05.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'longman@redhat.com' RCPT:'' Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 04/19/2018 04:39 PM, Alexey Dobriyan wrote: > On Thu, Apr 19, 2018 at 04:21:14PM -0400, Waiman Long wrote: >> On 04/19/2018 03:55 PM, Alexey Dobriyan wrote: >>> On Thu, Apr 19, 2018 at 03:28:40PM -0400, Waiman Long wrote: >>>> On 04/19/2018 03:08 PM, Alexey Dobriyan wrote: >>>>>> Therefore, application performance can be impacted if the application >>>>>> reads /proc/stat rather frequently. >>>>> [nods] >>>>> Text interfaces can be designed in a very stupid way. >>>>> >>>>>> For example, reading /proc/stat in a certain 2-socket Skylake server >>>>>> took about 4.6ms because it had over 5k irqs. >>>>> Is this top(1)? What is this application doing? >>>>> If it needs percpu usage stats, then maybe /proc/stat should be >>>>> converted away from single_open() so that core seq_file code doesn't >>>>> generate everything at once. >>>> The application is actually a database benchmarking tool used by a >>>> customer. >>> So it probably needs lines before "intr" line. >>> >>>> The reading of /proc/stat is an artifact of the benchmarking >>>> tool that can actually be turned off. Without doing that, about 20% of >>>> CPU time were spent reading /proc/stat and the trashing of cachelines >>>> slowed the benchmark number quite significantly. However, I was also >>>> told that there are legitimate cases where reading /proc/stat was >>>> necessary in some of their applications. >>>> >>>>>> - >>>>>> - /* sum again ? it could be updated? */ >>>>>> - for_each_irq_nr(j) >>>>>> - seq_put_decimal_ull(p, " ", kstat_irqs_usr(j)); >>>>>> - >>>>> This is direct userspace breakage. >>>> Yes, I am aware of that. That is the cost of improving the performance >>>> of applications that read /proc/stat, but don't need the individual irq >>>> counts. >>> Yeah, but all it takes is one script which cares. >>> >>> I have an idea. >>> >>> Maintain "maximum registered irq #", it should be much smaller than >>> "nr_irqs": >>> >>> intr 4245359 151 0 0 0 0 0 0 0 38 0 0 0 0 0 0 0 0 0 64 0 0 0 0 0 0 0 0 0 0 44330 182364 57741 0 0 0 0 0 0 0 0 85 89124 0 0 0 0 0 323360 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> Yes, that can probably help. >> >> This is the data from the problematic skylake server: >> >> model name : Intel(R) Xeon(R) Gold 6136 CPU @ 3.00GHz >> 56 sosreport-carevalo.02076935-20180413085327/proc/stat >> Interrupts: 5370 >> Interrupts without "0" entries: 1011 >> >> There are still quite a large number of non-zero entries, though. >> >>> Or maintain array of registered irqs and iterate over them only. >> Right, we can allocate a bitmap of used irqs to do that. >> >>> I have another idea. >>> >>> perf record shows mutex_lock/mutex_unlock at the top. >>> Most of them are irq mutex not seqfile mutex as there are many more >>> interrupts than reads. Take it once. >>> >> How many cpus are in your test system? In that skylake server, it was >> the per-cpu summing operation of the irq counts that was consuming most >> of the time for reading /proc/stat. I think we can certainly try to >> optimize the lock taking. > It's 16x(NR_IRQS: 4352, nr_irqs: 960, preallocated irqs: 16) > Given that irq registering is rare operation, maintaining sorted array > of irq should be the best option. >> For the time being, I think I am going to have a clone /proc/stat2 as >> suggested in my earlier email. Alternatively, I can put that somewhere >> in sysfs if you have a good idea of where I can put it. > sysfs is strictly one-value-per-file. > >> I will also look into ways to optimize the current per-IRQ stats >> handling, but it will come later. > There is always a time-honored way of ioctl(2) switching irq info off > /proc supports that. > > There are many options. OK, it is good to know. Do you have any existing code snippet in the kernel that I can use as reference on how to use ioctl(2) switching? I will look into how to optimize the existing per-IRQ stats code first before venturing into cloning /proc/stat. Cheers, Longman