Received: by 2002:a05:6a10:a0d1:0:0:0:0 with SMTP id j17csp2564704pxa; Fri, 7 Aug 2020 14:33:04 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyhk3nZP/oKyJ+oI8AR+W5d8N2JFFEjZ0khTyIQIsp1YXax9sVBszgbQxyYM0xpYCTmnuPy X-Received: by 2002:a17:906:c406:: with SMTP id u6mr11775860ejz.47.1596835984394; Fri, 07 Aug 2020 14:33:04 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1596835984; cv=none; d=google.com; s=arc-20160816; b=jpRSSJls7PQyMTyohWQFXGHmqE9zdHq3j8KhvH0VRxxsb9kgSOFaxgFnDopz6+G9lE UmvMLTJ02HzFxIMJcdIEaLEfhxn2EGVwg3I/a8U8CjRLED6MUeORHAiwVYILGM9Q2ugz 5NRUAqZ4DASzyje4mcJHvwC3/z9EbxeGVAJu2edGL/P2B8S4SJ/VMt85oZvFHSNSLX0p 7lpMqHTIrmLHaNcQ8FN34vzO1uF2KZdeD8K2jYWy1ygcyoQZWRiVzzAtTV9SZRKrL1/X HM/2Wc+C0F2SK9V52jBX9TiMlcHZEyG0z184cPM+oTpVQknXsx4zPxSc6SSwO2l8oG/E lvKA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:from:subject:mime-version :message-id:date:dkim-signature; bh=2Epbq3zd+57RV+c5a3tR14Y8EZnVI9kXEnkZqe3uwAA=; b=lRJa/pNr3urmpuQRdHdnPg6fnnxasFFfBYqYYUsXRjzACLhKgxMvDEkcv+CJu2sLy7 U6ZLl86GZ92PbTing2fahLIvtV/TEZlT4JUPtAKjkpd3L0i+NdTn5renURH37cBX36bN r1yfIlBikByXsBnpcaQN7lgM6UH3NBmgwI2wlbyWxiZ8NqOQSbAzP0vbxUWpaMvUIfZd gPttZFQ7vwqiASaTLjpWHMln+GpzV3xNUrWClz64jtBfXeF57qrvhzNfXIUr5wUff/AP fiz5OYUg+LbQ7HJGsxiavDQ88Ff0/eyrSzRRghXOw/WmTLna85rAT9rlIfLX7Z9vMaop faNA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=W4JL+gkP; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id x6si6642010ede.204.2020.08.07.14.32.41; Fri, 07 Aug 2020 14:33:04 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=W4JL+gkP; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727057AbgHGV3g (ORCPT + 99 others); Fri, 7 Aug 2020 17:29:36 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53448 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727021AbgHGV3f (ORCPT ); Fri, 7 Aug 2020 17:29:35 -0400 Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 628F9C061A28 for ; Fri, 7 Aug 2020 14:29:35 -0700 (PDT) Received: by mail-yb1-xb49.google.com with SMTP id n21so4349901ybf.18 for ; Fri, 07 Aug 2020 14:29:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:message-id:mime-version:subject:from:to:cc; bh=2Epbq3zd+57RV+c5a3tR14Y8EZnVI9kXEnkZqe3uwAA=; b=W4JL+gkPyvYfdfwk6ydvYnTnosTt4pvJ7tL6kLXU+WUTb6rftiPRdMePYrbZabnBTD +ISkyoN5LHfFW3Z6CwOyFJX0djyFH2uNFR35eVqx4/Y/Qi47Gs+4ceJ48kzO4YJo7Gux deL3jBPiYpM3y+EqgYB+N5By3UQ0F55MxwqLYjl/Awc3revSFDBw4yahIaJqx+SGgXOg vquf/jHyOotjcNWthbAW3uUC7wpRsVMjWeVPbcdTZppHRp2GfKlamKeJ1xDUvV2FIBM8 PYK8Tlv+XXASS++2PpJDsFc0cd4D+NzUwhVxDOepLelhsNX9Zlr+hx87Y8ZZOIZF3C1g i+rw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:message-id:mime-version:subject:from:to:cc; bh=2Epbq3zd+57RV+c5a3tR14Y8EZnVI9kXEnkZqe3uwAA=; b=PDTnPzoSuNUQIe0bQtAkuiHKjEF2sY6Q4eF+QMIXqmjKn1hhvtiKwjv0/qJAucJlFj wgQQ7w+/XIf4G+/nUJsDHnNPZSYH4gFJz6PFP/bLBNy+qDDaFHHYj6WVSOh016FwvMzr J24VwRmwbwAqzKRkcQGh8I+DikVhHMk7kTpvwdIGD+eRbzAj71raXpHC4eOAehJ1boWw dTAmWTjzo9g8YshjDust4pwS83ABa7maPbHTKvJXn/DlMz/rmBX5RFQse0qqOqhbSqFr 7Sv/en8i4QPtp/CNdf8h4Md91fpt6Fdh65Hb0glG9TvbMbFLheznqncciOF0Q7XO2Kw9 criQ== X-Gm-Message-State: AOAM532n/DTxJW9mgH4wY61sNFKWPEKmlYlKVJTWo7WdwzBW0VYPY6dw k2lr7E8rHLNIp35RA4LDQFJLiXX6PzB0stg6kS26t+3uGLFEcIyAjqv7/gGuaQsXmcocGDqO+lq 3J4c0pZxHa4foMvKaeVn0y7EgExfVbcmcWKlEnr98Pom0krq0sDHmrq537Rd/hXITzZQoCuI= X-Received: by 2002:a25:d709:: with SMTP id o9mr21697023ybg.392.1596835774334; Fri, 07 Aug 2020 14:29:34 -0700 (PDT) Date: Fri, 7 Aug 2020 14:29:09 -0700 Message-Id: <20200807212916.2883031-1-jwadams@google.com> Mime-Version: 1.0 X-Mailer: git-send-email 2.28.0.236.gb10cc79966-goog Subject: [RFC PATCH 0/7] metricfs metric file system and examples From: Jonathan Adams To: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org Cc: netdev@vger.kernel.org, kvm@vger.kernel.org, Paolo Bonzini , Greg KH , Jim Mattson , David Rientjes , Jonathan Adams Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org [resending to widen the CC lists per rdunlap@infradead.org's suggestion original posting to lkml here: https://lkml.org/lkml/2020/8/5/1009] To try to restart the discussion of kernel statistics started by the statsfs patchsets (https://lkml.org/lkml/2020/5/26/332), I wanted to share the following set of patches which are Google's 'metricfs' implementation and some example uses. Google has been using metricfs internally since 2012 as a way to export various statistics to our telemetry systems (similar to OpenTelemetry), and we have over 200 statistics exported on a typical machine. These patches have been cleaned up and modernized v.s. the versions in production; I've included notes under the fold in the patches. They're based on v5.8-rc6. The statistics live under debugfs, in a tree rooted at: /sys/kernel/debug/metricfs Each metric is a directory, with four files in it. For example, the ' core/metricfs: Create metricfs, standardized files under debugfs.' patch includes a simple 'metricfs_presence' metric, whose files look like: /sys/kernel/debug/metricfs: metricfs_presence/annotations DESCRIPTION A\ basic\ presence\ metric. metricfs_presence/fields value int metricfs_presence/values 1 metricfs_presence/version 1 (The "version" field always says '1', and is kind of vestigial) An example of a more complicated stat is the networking stats. For example, the tx_bytes stat looks like: net/dev/stats/tx_bytes/annotations DESCRIPTION net\ device\ transmited\ bytes\ count CUMULATIVE net/dev/stats/tx_bytes/fields interface value str int net/dev/stats/tx_bytes/values lo 4394430608 eth0 33353183843 eth1 16228847091 net/dev/stats/tx_bytes/version 1 The per-cpu statistics show up in the schedulat stat info and x86 IRQ counts. For example: stat/user/annotations DESCRIPTION time\ in\ user\ mode\ (nsec) CUMULATIVE stat/user/fields cpu value int int stat/user/values 0 1183486517734 1 1038284237228 ... stat/user/version 1 The full set of example metrics I've included are: core/metricfs: Create metricfs, standardized files under debugfs. metricfs_presence core/metricfs: metric for kernel warnings warnings/values core/metricfs: expose scheduler stat information through metricfs stat/* net-metricfs: Export /proc/net/dev via metricfs. net/dev/stats/[tr]x_* core/metricfs: expose x86-specific irq information through metricfs irq_x86/* The general approach is called out in kernel/metricfs.c: The kernel provides: - A description of the metric - The subsystem for the metric (NULL is ok) - Type information about the metric, and - A callback function which supplies metric values. Limitations: - "values" files are at MOST 64K. We truncate the file at that point. - The list of fields and types is at most 1K. - Metrics may have at most 2 fields. Best Practices: - Emit the most important data first! Once the 64K per-metric buffer is full, the emit* functions won't do anything. - In userspace, open(), read(), and close() the file quickly! The kernel allocation for the metric is alive as long as the file is open. This permits users to seek around the contents of the file, while permitting an atomic view of the data. Note that since the callbacks are called and the data is generated at file open() time, the relative consistency is only between members of a given metric; the rx_bytes stat for every network interface will be read at almost the same time, but if you want to get rx_bytes and rx_packets, there could be a bunch of slew between the two file opens. (So this doesn't entirely address Andrew Lunn's comments in https://lkml.org/lkml/2020/5/26/490) This also doesn't address one of the basic parts of the statsfs work: moving the statistics out of debugfs to avoid lockdown interactions. Google has found a lot of value in having a generic interface for adding these kinds of statistics with reasonably low overhead (reading them is O(number of statistics), not number of objects in each statistic). There are definitely warts in the interface, but does the basic approach make sense to folks? Thanks, - Jonathan Jonathan Adams (5): core/metricfs: add support for percpu metricfs files core/metricfs: metric for kernel warnings core/metricfs: expose softirq information through metricfs core/metricfs: expose scheduler stat information through metricfs core/metricfs: expose x86-specific irq information through metricfs Justin TerAvest (1): core/metricfs: Create metricfs, standardized files under debugfs. Laurent Chavey (1): net-metricfs: Export /proc/net/dev via metricfs. arch/x86/kernel/irq.c | 80 ++++ fs/proc/stat.c | 57 +++ include/linux/metricfs.h | 131 +++++++ kernel/Makefile | 2 + kernel/metricfs.c | 775 +++++++++++++++++++++++++++++++++++++ kernel/metricfs_examples.c | 151 ++++++++ kernel/panic.c | 131 +++++++ kernel/softirq.c | 45 +++ lib/Kconfig.debug | 18 + net/core/Makefile | 1 + net/core/net_metricfs.c | 194 ++++++++++ 11 files changed, 1585 insertions(+) create mode 100644 include/linux/metricfs.h create mode 100644 kernel/metricfs.c create mode 100644 kernel/metricfs_examples.c create mode 100644 net/core/net_metricfs.c -- 2.28.0.236.gb10cc79966-goog