Received: by 2002:ac0:98c7:0:0:0:0:0 with SMTP id g7-v6csp4090159imd; Mon, 29 Oct 2018 17:59:26 -0700 (PDT) X-Google-Smtp-Source: AJdET5fJGsYX4FqDhlKTR5sjcP/SCh3OUOWVf9g3RllLrGWtkn8vgev53kKVVLlNOgLgQpv1Ir++ X-Received: by 2002:a63:955a:: with SMTP id t26mr13896297pgn.449.1540861166738; Mon, 29 Oct 2018 17:59:26 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1540861166; cv=none; d=google.com; s=arc-20160816; b=ZAF5YyGvgI5SzMfBcm16aLjwgvogGS3g0HVyQA6Ff/CK0piA5z7PM2pplJwDV5rdJM OeYBS7Pp92bjB0IJ9O8WO2TfOiTs01SpzjnFC7++YLVwZUNm3KVRuViKguimFk5r2AnU 3XOSH+PNprYfALnnz6RxPbZQUvIlfI/Y6LS54hUYFBY36UdkrwDARYETE6j0QESgYa6K KF6UNGiQkbGbx5j30NSnivAh0tQrTp6ukHrgUCV80c9hEY1X7Bgh/d0BvvXd5w06D4a/ FPJz+0sl4NT2jVNWLpnzXChsbbT4H1GsEi/3cszHOT+0qI7bz0ujihI/GkEmUfYfwpyf IuIA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=vlifOEdWcd1vSyGnN3GeChUvWwz75ZiFkSlpREOEf4w=; b=cGttR/f/yskkwpFCT15OCa6tMSyJf2bcgKW188HiIOYAO3CoGWmam4BdfHCCnzgsa4 n4Fv6GqksfjZ6bv6+nv3Fg23ZWLELt59a3ztwvuKKcHtus5i4PcsMNkvzriz+no0v1pe 0746yY2ahzJSCEosfK1oGUqp6czy2Sk+NolZQp2b92cQuOCG+Ts1nJz42ovbZG5m63Hf zb3/9X3AqBMrKCBT1Mr7y0xZ2rSwHzK/NrsnXze6hRblJ7LB4rHZ9jSPNMMcso3KBu82 N9rH+FLMTIDVr/bba6cU1McrQz7PofTi5NfqnNiUXUa+gSWSK+WsSWRk2FytMjmNl1wd FVtg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id v8-v6si22583671pgo.220.2018.10.29.17.59.07; Mon, 29 Oct 2018 17:59:26 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726070AbeJ3JuA (ORCPT + 99 others); Tue, 30 Oct 2018 05:50:00 -0400 Received: from shells.gnugeneration.com ([66.240.222.126]:59378 "EHLO shells.gnugeneration.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725964AbeJ3JuA (ORCPT ); Tue, 30 Oct 2018 05:50:00 -0400 Received: by shells.gnugeneration.com (Postfix, from userid 1000) id 9B5CC1A402D8; Mon, 29 Oct 2018 17:58:45 -0700 (PDT) Date: Mon, 29 Oct 2018 17:58:45 -0700 From: Vito Caputo To: Daniel Colascione Cc: Davidlohr Bueso , Andrew Morton , longman@redhat.com, linux-fsdevel@vger.kernel.org, linux-kernel , Davidlohr Bueso Subject: Re: [PATCH] fs/proc: introduce /proc/stat2 file Message-ID: <20181030005845.c3x2z72ns4qhbida@shells.gnugeneration.com> References: <20181029192521.23059-1-dave@stgolabs.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: NeoMutt/20170113 (1.7.2) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Oct 29, 2018 at 11:04:45PM +0000, Daniel Colascione wrote: > On Mon, Oct 29, 2018 at 7:25 PM, Davidlohr Bueso wrote: > > This patch introduces a new /proc/stat2 file that is identical to the > > regular 'stat' except that it zeroes all hard irq statistics. The new > > file is a drop in replacement to stat for users that need performance. > > For a while now, I've been thinking over ways to improve the > performance of collecting various bits of kernel information. I don't > think that a proliferation of special-purpose named bag-of-fields file > variants is the right answer, because even if you add a few info-file > variants, you're still left with a situation where a given file > provides a particular caller with too little or too much information. > I'd much rather move to a model in which userspace *explicitly* tells > the kernel which fields it wants, with the kernel replying with just > those particular fields, maybe in their raw binary representations. > The ASCII-text bag-of-everything files would remain available for > ad-hoc and non-performance critical use, but programs that cared about > performance would have an efficient bypass. One concrete approach is > to let users open up today's proc files and, instead of read(2)ing a > text blob, use an ioctl to retrieve specified and targeted information > of the sort that would normally be encoded in the text blob. Because > callers would open the same file when using either the text or binary > interfaces, little would have to change, and it'd be easy to implement > fallbacks when a particular system doesn't support a particular > fast-path ioctl. We have two extremes of granularity in the /proc and /sys virtual filesystems today: On procfs there's these legacy files which aggregate loosely-related system information, and in cases where you actually want most of what's provided, it's a nice optimization because you can sample it all in a single pread() call. On sysfs the granularity is much finer with it being fairly common to find a file-per-datum. This has other advantages, like not needing to parse snowflake formats which sometimes varied across kernel versions like in procfs, or needing to burden the kernel to produce more information than necessary. But anyone who has written tools trying to sample large subsets of the granular information in sysfs at a high rate will know how quickly it becomes rather costly in terms of system calls. The last time I went down this path, I wished there were a system call like readv() which accepted a vector a new iovec type specifying an fd. Then the sysfs model could be made a more efficient by coalescing all the required read syscalls into a single megaread bundling all the relevant fds that are simply kept open and reused. If we had such a readv() variant, the sysfs granular model could be used to granularly expose all the information we currently expose in /proc, while still being relatively efficient in terms of system calls per sample. Sure you still have to lookup and open all the files of interest, but that only needs to occur once at initialization. Regards, Vito Caputo