Received: by 2002:a25:23cc:0:0:0:0:0 with SMTP id j195csp932371ybj; Thu, 7 May 2020 10:48:22 -0700 (PDT) X-Google-Smtp-Source: APiQypLuyuCdye1L/YuE+AD6ZuyLrxT/E7zqS7H+lC5kwQMEheeWUQbXCHkjA4lkMBVdq8fwPxOj X-Received: by 2002:aa7:d403:: with SMTP id z3mr13428697edq.43.1588873702197; Thu, 07 May 2020 10:48:22 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1588873702; cv=none; d=google.com; s=arc-20160816; b=gRbPU0ls3YMkhr9v6iPOMwo6S8Aik39dLixxLo0TT2ofUNZwo5Wt6be5BUJjBfOBSg vhmqLgGs3w8PY03YMZ55LGv3DC4BzECxLhYXAQthqUl+1ZXNJhvwMm6M0wy5/GfBjhO4 vKGvtMXw+WdrvYtPbVUKiGBrasA5yAj4Ze7Ed4zhJkiZNoognbD+z2/go78QKJfVxH71 9Z3uLt5+gF5T8jh/EqzH7FKrf+6D7HFOr7gY+jCEDu0V8Yd30W/vtqsVL3iZTXF2JC+9 51mwyxFPNUvzwF4YWJinGDibIWTQ/R4mwVcxkawoWNcqfOh3Lnv1uK+9f58szWyOuUOu XgsA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=yU0mJ28GxeMYjUBMEveE72PctswqptxZPEqLuEZt3cA=; b=V3oAm2YZcTuF20V0Mnxwka/ndFljLyaxAdfvLQB1RXblqROUhPHbzuLIIcZnsd0vk4 R9p23WyZpt3H7vEB3u1mcvwhdSN3dGt9viHUFOO6aJpeaclr62N6LOAO8nQDe+UOLVBR egWg80sG0XbXfJepkZxhX0GVPtG0sdd5wOiTt7VFPpOVmqy0mOqv/3c7G0xlxb5seIPo hnpbEQSlzHQUk46jMjfy6wq27vjyDHLedjjNsCUD7ZFGFr39GgYw6EM+A8FWOrJx5z/u dJRfQEdVOCND7rp5mIJrdVRQNoTJGqHHbzoXgyfoI9ABB4fOF9z5l4OKYLcPso4uX9aL WArg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=vnd3Wz+k; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id ck24si3680311ejb.45.2020.05.07.10.47.58; Thu, 07 May 2020 10:48:22 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=vnd3Wz+k; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726950AbgEGRqU (ORCPT + 99 others); Thu, 7 May 2020 13:46:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51778 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S1726518AbgEGRqS (ORCPT ); Thu, 7 May 2020 13:46:18 -0400 Received: from mail-io1-xd41.google.com (mail-io1-xd41.google.com [IPv6:2607:f8b0:4864:20::d41]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7D56DC05BD09 for ; Thu, 7 May 2020 10:46:18 -0700 (PDT) Received: by mail-io1-xd41.google.com with SMTP id 19so1379239ioz.10 for ; Thu, 07 May 2020 10:46:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=yU0mJ28GxeMYjUBMEveE72PctswqptxZPEqLuEZt3cA=; b=vnd3Wz+k5EBZmosgqi5KKkGOlJPJfOp4+cC5rwQ5WKpxRuQIpmmoJ3cc9pg33/Ra2g /q3iGdwk0pNt0z7FaQrx4sYZVKT6xvn6L11gFpzroAtPAbaIaFOaFA33AOzmsY2oJVzE jtgyAMHmQRTny/Ymt7q8IJQ7lmehM/dp2fOPtkiEAf4fIS+TJ1gOCEXPyqVc+EempRvT LnIlcIIg3aPKzbfL50LWiVOlU2fXt/KiLSoS1PkVzECJlOLjp7YUx+z81+MOMhjkymjK 17ES11qP77zrlhR4/OPv4Jsjafu984eSlgI/TG6K8fqacWkVukhDDXeHg0UV09ujnOXR 8eVw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=yU0mJ28GxeMYjUBMEveE72PctswqptxZPEqLuEZt3cA=; b=o+ACzrfB7ATVF4nxfUn5ghsQHzog1mT6Ikdp/0QbeYl4kRjQpQzP958fA/kws4BIhC g/eVnsvpq4jZh3xy4BAxSVXXARIAZ4cqua5FUFPOAaMe5CHtKyeMMmDOugTF52u9DbZS TeX2bXD9xdUvyk0Dy/UuPemB+uTMpDLFt05p8BN1z3c9POc9kn+sjJ6H+nYtRJbNg1hD iHQacdKflaY2bUdnJ2Z2DKlLv9cT1kSz7vUljZ1EpTjyj81ySkNuC2zpbNH5HlXws4qk BND7f39h/89Blkufut/R1fGRw5nlcQsIofNVuzYZI8c4cj8TnekDVLHwlZgaYrm9W/sv KypQ== X-Gm-Message-State: AGi0PuZrnGZPKep28G7LicHS8/1mxw1w4KYeGgSdOyIx2VWpMgVki92P zBaqelz+qZHryYpux82/Pv52VbG52acpex2F7I5L X-Received: by 2002:a6b:c9cc:: with SMTP id z195mr2514506iof.164.1588873577069; Thu, 07 May 2020 10:46:17 -0700 (PDT) MIME-Version: 1.0 References: <20200504110344.17560-1-eesposit@redhat.com> In-Reply-To: <20200504110344.17560-1-eesposit@redhat.com> From: Jonathan Adams Date: Thu, 7 May 2020 10:45:40 -0700 Message-ID: Subject: Re: [PATCH v2 0/5] Statsfs: a new ram-based file sytem for Linux kernel statistics To: Emanuele Giuseppe Esposito Cc: kvm list , Christian Borntraeger , David Hildenbrand , Cornelia Huck , Paolo Bonzini , Vitaly Kuznetsov , Jim Mattson , Alexander Viro , Emanuele Giuseppe Esposito , LKML , linux-mips@vger.kernel.org, kvm-ppc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org, linux-fsdevel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, May 4, 2020 at 4:05 AM Emanuele Giuseppe Esposito wrote: ... > Statsfs offers a generic and stable API, allowing any kind of > directory/file organization and supporting multiple kind of aggregations > (not only sum, but also average, max, min and count_zero) and data types > (all unsigned and signed types plus boolean). The implementation, which i= s > a generalization of KVM=E2=80=99s debugfs statistics code, takes care of = gathering > and displaying information at run time; users only need to specify the > values to be included in each source. > > Statsfs would also be a different mountpoint from debugfs, and would not > suffer from limited access due to the security lock down patches. Its mai= n > function is to display each statistics as a file in the desired folder > hierarchy defined through the API. Statsfs files can be read, and possibl= y > cleared if their file mode allows it. > > Statsfs has two main components: the public API defined by > include/linux/statsfs.h, and the virtual file system which should end up > in /sys/kernel/stats. This is good work. As David Rientjes mentioned, I'm currently investigatin= g a similar project, based on a google-internal debugfs-based FS we call "metricfs". It's designed in a slightly different fashion than statsfs here is, and the statistics exported are mostly fed into our OpenTelemetry-like system. We're motivated by wanting an upstreamed solution, so that we can upstream the metrics we create that are of general interest, and lower the overall rebasing burden for our tree. Some feedback on your design as proposed: - the 8/16/32/64 signed/unsigned integers seems like a wart, and the built-in support to grab any offset from a structure doesn't seem like much of an advantage. A simpler interface would be to just support an "integer" (possibly signed/unsigned) type, which is always 64-bit, and allow the caller to provide a function pointer to retrieve the value, with one or two void *s cbargs. Then the framework could provide an offset-based callback (or callbacks) similar to the existing functionality, and a similar one for per-CPU based statistics. A second "clear" callback could be optionally provided to allow for statistics to be cleared, as in your current proposal. - A callback-style interface also allows for a lot more flexibility in sourcing values, and doesn't lock your callers into one way of storing them. You would, of course, have to be clear about locking rules etc. for the callbacks. - Beyond the statistic's type, one *very* useful piece of metadata for telemetry tools is knowing whether a given statistic is "cumulative" (an unsigned counter which is only ever increased), as opposed to a floating value (like "amount of memory used"). I agree with the folks asking for a binary interface to read statistics, but I also agree that it can be added on later. I'm more concerned with getting the statistics model and capabilities right from the beginning, because those are harder to adjust later. Would you be open to collaborating on the statsfs design? As background for this discussion, here are some details of how our metricfs implementation approaches statistics: 1. Each metricfs metric can have one or two string or integer "keys". If these exist, they expand the metric from a single value into a multi-dimensional table. For example, we use this to report a hash table we keep of functions calling "WARN()", in a 'warnings' statistic: % cat .../warnings/values x86_pmu_stop 1 % Indicates that the x86_pmu_stop() function has had a WARN() fire once since the system was booted. If multiple functions have fired WARN()s, they are listed in this table with their own counts. [1] We also use these to report per-CPU counters on a CPU-by-CPU basis: % cat .../irq_x86/NMI/values 0 42 1 18 ... one line per cpu % 2. We also export some metadata about each statistic. For example, the metadata for the NMI counter above looks like: % cat .../NMI/annotations DESCRIPTION Non-maskable\ interrupts CUMULATIVE % cat .../NMI/fields cpu value int int % (Describing the statistic, marking it as "cumulative", and saying the fields are "cpu" and "value", both ints). The metadata doesn't change much, so having separate files allows the user-space agent to read them once and then the values multiple times. 3. We have a (very few) statistics where the value itself is a string, usually for device statuses. For our use cases, we generally don't both output a statistic and it's aggregation from the kernel; either we sum up things in the kernel (e.g. over a bunch of per-cpu or per-memcg counters) and only have the result statistic, or we expect user-space to sum up the data if it's interested. The tabular form makes it pretty easy to do so (i.e. you can use awk(1) to sum all of the per-cpu NMI counters). We don't generally reset statistics, except as a side effect of removing a device. Thanks again for the patchset, and for pointing out that KVM also needs statistics sent out; it's great that there is interest in this. Cheers, - jonathan P.S. I also have a couple (non-critical) high-level notes: * It's not clear what tree your patches are against, or their dependencies; I was able to get them to apply to linux-next master with a little massaging, but then they failed to compile because they're built on top of your "libfs: group and simplify linux fs code" patch series you sent out in late april. Including a git link or at least a baseline tree and a list of the patch series you rely upon is helpful for anyone wanting to try out your changes. * The main reason I was trying to try out your patches was to get a sense of the set of directories and things the KVM example generates; while it's apparently the same as the existing KVM debugfs tree, it's useful to know how this ends up looking on a real system, and I'm not familiar with the KVM stats. Since this patch is intended slightly more broadly than just KVM, it might have been useful to include sample output for those not familiar with how things are today. [1] We also use this to export various network/storage statistics on a per-device basis. e.g. network bytes received counts: % cat .../rx_bytes/values lo 501360681 eth0 1457631256 ... %