2006-05-24 12:34:11

by Martin Peschke

[permalink] [raw]
Subject: [Patch 5/6] statistics infrastructure

This patch adds statistics infrastructure as common code.

Signed-off-by: Martin Peschke <[email protected]>
---

MAINTAINERS | 7
arch/s390/Kconfig | 6
arch/s390/oprofile/Kconfig | 5
include/linux/statistic.h | 348 ++++++++++
lib/Kconfig.statistic | 11
lib/Makefile | 2
lib/statistic.c | 1459 +++++++++++++++++++++++++++++++++++++++++++++
7 files changed, 1833 insertions(+), 5 deletions(-)

diff -Nurp a/include/linux/statistic.h b/include/linux/statistic.h
--- a/include/linux/statistic.h 1970-01-01 01:00:00.000000000 +0100
+++ b/include/linux/statistic.h 2006-05-19 16:23:07.000000000 +0200
@@ -0,0 +1,348 @@
+/*
+ * include/linux/statistic.h
+ *
+ * Statistics facility
+ *
+ * (C) Copyright IBM Corp. 2005, 2006
+ *
+ * Author(s): Martin Peschke <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+ */
+
+#ifndef STATISTIC_H
+#define STATISTIC_H
+
+#include <linux/fs.h>
+#include <linux/types.h>
+#include <linux/percpu.h>
+
+#define STATISTIC_ROOT_DIR "statistics"
+
+#define STATISTIC_FILENAME_DATA "data"
+#define STATISTIC_FILENAME_DEF "definition"
+
+#define STATISTIC_NEED_BARRIER 1
+
+struct statistic;
+
+enum statistic_state {
+ STATISTIC_STATE_INVALID,
+ STATISTIC_STATE_UNCONFIGURED,
+ STATISTIC_STATE_RELEASED,
+ STATISTIC_STATE_OFF,
+ STATISTIC_STATE_ON
+};
+
+enum statistic_type {
+ STATISTIC_TYPE_COUNTER_INC,
+ STATISTIC_TYPE_COUNTER_PROD,
+ STATISTIC_TYPE_UTIL,
+ STATISTIC_TYPE_HISTOGRAM_LIN,
+ STATISTIC_TYPE_HISTOGRAM_LOG2,
+ STATISTIC_TYPE_SPARSE,
+ STATISTIC_TYPE_NONE
+};
+
+#define STATISTIC_FLAGS_NOINCR 0x01
+
+/**
+ * struct statistic_info - description of a class of statistics
+ * @name: pointer to name name string
+ * @x_unit: pointer to string describing unit of X of (X, Y) data pair
+ * @y_unit: pointer to string describing unit of Y of (X, Y) data pair
+ * @flags: only flag so far (distinction of incremental and other statistic)
+ * @defaults: pointer to string describing defaults setting for attributes
+ *
+ * Exploiters must setup an array of struct statistic_info for a
+ * corresponding array of struct statistic, which are then pointed to
+ * by struct statistic_interface.
+ *
+ * Struct statistic_info and all members and addressed strings must stay for
+ * the lifetime of corresponding statistics created with statistic_create().
+ *
+ * Except for the name string, all other members may be left blank.
+ * It would be nice of exploiters to fill it out completely, though.
+ */
+struct statistic_info {
+/* public: */
+ char *name;
+ char *x_unit;
+ char *y_unit;
+ int flags;
+ char *defaults;
+};
+
+/**
+ * struct statistic_interface - collection of statistics for an entity
+ * @stat: a struct statistic array
+ * @info: a struct statistic_info array describing the struct statistic array
+ * @number: number of entries in both arrays
+ * @pull: an optional function called when user reads data from file
+ * @pull_private: optional data pointer passed to pull function
+ *
+ * Exploiters must setup a struct statistic_interface prior to calling
+ * statistic_create().
+ */
+struct statistic_interface {
+/* private: */
+ struct list_head list;
+ struct dentry *debugfs_dir;
+ struct dentry *data_file;
+ struct dentry *def_file;
+/* public: */
+ struct statistic *stat;
+ struct statistic_info *info;
+ int number;
+ int (*pull)(void*);
+ void *pull_private;
+};
+
+struct sgrb_seg {
+ struct list_head list;
+ char *address;
+ int offset;
+ int size;
+};
+
+struct statistic_file_private {
+ struct list_head read_seg_lh;
+ struct list_head write_seg_lh;
+ size_t write_seg_total_size;
+};
+
+struct statistic_merge_private {
+ struct statistic *stat;
+ spinlock_t lock;
+ void *dst;
+};
+
+/**
+ * struct statistic_discipline - description of a data processing mode
+ * @parse: parses additional attributes specific to this mode (if any)
+ * @alloc: allocates a data area (mandatory, default routine available)
+ * @free: frees a data area (optional, kfree() is used otherwise)
+ * @reset: discards content of a data area (mandatory)
+ * @merge: merges content of a data area into another data area (mandatory)
+ * @fdata: prints content of a data area into buffer (mandatory)
+ * @fdef: prints additional attributes specific to this mode (if any)
+ * @add: updates a data area for a statistic fed incremental data (mandatory)
+ * @set: updates a data area for a statistic fed total numbers (mandatory)
+ * @name: pointer to name string (mandatory)
+ * @size: base size for a data area (passed to alloc function)
+ *
+ * Struct statistic_discipline describes a statistic infrastructure internal
+ * programming interface. Another data processing mode can be added by
+ * implementing these routines and appending an entry to the
+ * statistic_discs array.
+ *
+ * "Data area" in te above description usually means a chunk of memory,
+ * may it be allocated for data gathering per CPU, or be shared by all
+ * CPUs, or used for other purposes, like merging per-CPU data when
+ * users read data from files. Implementers of data processing modes
+ * don't need to worry about the designation of a particular chunk of memory.
+ * A data area of a data processing mode always has to look the same.
+ */
+struct statistic_discipline {
+ int (*parse)(struct statistic *, struct statistic_info *, int, char *);
+ void* (*alloc)(struct statistic *, size_t, gfp_t, int);
+ void (*free)(struct statistic *, void *);
+ void (*reset)(struct statistic *, void *);
+ void (*merge)(struct statistic *, void *, void*);
+ int (*fdata)(struct statistic *, const char *,
+ struct statistic_file_private *, void *);
+ int (*fdef)(struct statistic *, char *);
+ void (*add)(struct statistic *, int, s64, u64);
+ void (*set)(struct statistic *, s64, u64);
+ char *name;
+ size_t size;
+};
+
+struct statistic_entry_util {
+ u32 res;
+ u32 num; /* FIXME: better 64 bit; do_div can't deal with it) */
+ s64 acc;
+ s64 min;
+ s64 max;
+};
+
+struct statistic_entry_sparse {
+ struct list_head list;
+ s64 value;
+ u64 hits;
+};
+
+struct statistic_sparse_list {
+ struct list_head entry_lh;
+ u32 entries;
+ u32 entries_max;
+ u64 hits_missed;
+};
+
+/**
+ * struct statistic - any data required for gathering data for a statistic
+ */
+struct statistic {
+/* private: */
+ enum statistic_state state;
+ enum statistic_type type;
+ struct percpu_data *pdata;
+ void (*add)(struct statistic *, int, s64, u64);
+ u64 started;
+ u64 stopped;
+ u64 age;
+ union {
+ struct {
+ s64 range_min;
+ u32 last_index;
+ u32 base_interval;
+ } histogram;
+ struct {
+ u32 entries_max;
+ } sparse;
+ } u;
+};
+
+#ifdef CONFIG_STATISTICS
+
+extern int statistic_create(struct statistic_interface *, const char *);
+extern int statistic_remove(struct statistic_interface *);
+
+/**
+ * statistic_add - update statistic with incremental data in (X, Y) pair
+ * @stat: struct statistic array
+ * @i: index of statistic to be updated
+ * @value: X
+ * @incr: Y
+ *
+ * The actual processing of the (X, Y) data pair is determined by the current
+ * the definition applied to the statistic. See Documentation/statistics.txt.
+ *
+ * This variant takes care of protecting per-cpu data. It is preferred whenever
+ * exploiters don't update several statistics of the same entity in one go.
+ */
+static inline void statistic_add(struct statistic *stat, int i,
+ s64 value, u64 incr)
+{
+ unsigned long flags;
+ local_irq_save(flags);
+ if (stat[i].state == STATISTIC_STATE_ON)
+ stat[i].add(&stat[i], smp_processor_id(), value, incr);
+ local_irq_restore(flags);
+}
+
+/**
+ * statistic_add_nolock - update statistic with incremental data in (X, Y) pair
+ * @stat: struct statistic array
+ * @i: index of statistic to be updated
+ * @value: X
+ * @incr: Y
+ *
+ * The actual processing of the (X, Y) data pair is determined by the current
+ * definition applied to the statistic. See Documentation/statistics.txt.
+ *
+ * This variant leaves protecting per-cpu data to exploiters. It is preferred
+ * whenever exploiters update several statistics of the same entity in one go.
+ */
+static inline void statistic_add_nolock(struct statistic *stat, int i,
+ s64 value, u64 incr)
+{
+ if (stat[i].state == STATISTIC_STATE_ON)
+ stat[i].add(&stat[i], smp_processor_id(), value, incr);
+}
+
+/**
+ * statistic_inc - update statistic with incremental data in (X, 1) pair
+ * @stat: struct statistic array
+ * @i: index of statistic to be updated
+ * @value: X
+ *
+ * The actual processing of the (X, Y) data pair is determined by the current
+ * definition applied to the statistic. See Documentation/statistics.txt.
+ *
+ * This variant takes care of protecting per-cpu data. It is preferred whenever
+ * exploiters don't update several statistics of the same entity in one go.
+ */
+static inline void statistic_inc(struct statistic *stat, int i, s64 value)
+{
+ unsigned long flags;
+ local_irq_save(flags);
+ if (stat[i].state == STATISTIC_STATE_ON)
+ stat[i].add(&stat[i], smp_processor_id(), value, 1);
+ local_irq_restore(flags);
+}
+
+/**
+ * statistic_inc_nolock - update statistic with incremental data in (X, 1) pair
+ * @stat: struct statistic array
+ * @i: index of statistic to be updated
+ * @value: X
+ *
+ * The actual processing of the (X, Y) data pair is determined by the current
+ * definition applied to the statistic. See Documentation/statistics.txt.
+ *
+ * This variant leaves protecting per-cpu data to exploiters. It is preferred
+ * whenever exploiters update several statistics of the same entity in one go.
+ */
+static inline void statistic_inc_nolock(struct statistic *stat, int i,
+ s64 value)
+{
+ if (stat[i].state == STATISTIC_STATE_ON)
+ stat[i].add(&stat[i], smp_processor_id(), value, 1);
+}
+
+extern void statistic_set(struct statistic *, int, s64, u64);
+
+#else /* CONFIG_STATISTICS */
+
+static inline int statistic_create(struct statistic_interface *interface,
+ const char *name)
+{
+ return 0;
+}
+
+static inline int statistic_remove(
+ struct statistic_interface *interface_ptr)
+{
+ return 0;
+}
+
+static inline void statistic_add(struct statistic *stat, int i,
+ s64 value, u64 incr)
+{
+}
+
+static inline void statistic_add_nolock(struct statistic *stat, int i,
+ s64 value, u64 incr)
+{
+}
+
+static inline void statistic_inc(struct statistic *stat, int i, s64 value)
+{
+}
+
+static inline void statistic_inc_nolock(struct statistic *stat, int i,
+ s64 value)
+{
+}
+
+static inline void statistic_set(struct statistic *stat, int i,
+ s64 value, u64 total)
+{
+}
+
+#endif /* CONFIG_STATISTICS */
+
+#endif /* STATISTIC_H */
diff -Nurp a/lib/statistic.c b/lib/statistic.c
--- a/lib/statistic.c 1970-01-01 01:00:00.000000000 +0100
+++ b/lib/statistic.c 2006-05-19 16:22:55.000000000 +0200
@@ -0,0 +1,1459 @@
+/*
+ * lib/statistic.c
+ * statistics facility
+ *
+ * Copyright (C) 2005, 2006
+ * IBM Deutschland Entwicklung GmbH,
+ * IBM Corporation
+ *
+ * Author(s): Martin Peschke ([email protected]),
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+ *
+ * another bunch of ideas being pondered:
+ * - define a set of agreed names or a naming scheme for
+ * consistency and comparability across exploiters;
+ * this entails an agreement about granularities
+ * as well (e.g. separate statistic for read/write/no-data commands);
+ * a common set of unit strings would be nice then, too, of course
+ * (e.g. "seconds", "milliseconds", "microseconds", ...)
+ * - perf. opt. of array: table lookup of values, binary search for values
+ * - another statistic disclipline based on some sort of tree, but
+ * similar in semantics to list discipline (for high-perf. histograms of
+ * discrete values)
+ * - allow for more than a single "view" on data at the same time by
+ * providing the capability to attach several (a list of) "definitions"
+ * to a struct statistic
+ * (e.g. show histogram of requests sizes and history of megabytes/sec.
+ * at the same time)
+ * - multi-dimensional statistic (combination of two or more
+ * characteristics/discriminators); worth the effort??
+ * (e.g. a matrix of occurences for latencies of requests of
+ * particular sizes)
+ *
+ * FIXME:
+ * - statistics file access when statistics are being removed
+ */
+
+#include <linux/fs.h>
+#include <linux/debugfs.h>
+#include <linux/module.h>
+#include <linux/list.h>
+#include <linux/parser.h>
+#include <linux/time.h>
+#include <linux/sched.h>
+#include <linux/cpu.h>
+#include <linux/percpu.h>
+#include <linux/mutex.h>
+#include <linux/statistic.h>
+
+#include <asm/bug.h>
+#include <asm/uaccess.h>
+
+static struct statistic_discipline statistic_discs[];
+
+static inline int statistic_initialise(struct statistic *stat)
+{
+ stat->type = STATISTIC_TYPE_NONE;
+ stat->state = STATISTIC_STATE_UNCONFIGURED;
+ return 0;
+}
+
+static inline int statistic_uninitialise(struct statistic *stat)
+{
+ stat->state = STATISTIC_STATE_INVALID;
+ return 0;
+}
+
+static inline int statistic_define(struct statistic *stat)
+{
+ if (stat->type == STATISTIC_TYPE_NONE)
+ return -EINVAL;
+ stat->state = STATISTIC_STATE_RELEASED;
+ return 0;
+}
+
+static inline void statistic_reset_ptr(struct statistic *stat, void *ptr)
+{
+ struct statistic_discipline *disc = &statistic_discs[stat->type];
+ if (ptr)
+ disc->reset(stat, ptr);
+}
+
+static inline void statistic_move_ptr(struct statistic *stat, void *src)
+{
+ struct statistic_discipline *disc = &statistic_discs[stat->type];
+ unsigned long flags;
+ local_irq_save(flags);
+ disc->merge(stat, stat->pdata->ptrs[smp_processor_id()], src);
+ local_irq_restore(flags);
+}
+
+static inline void statistic_free_ptr(struct statistic *stat, void *ptr)
+{
+ struct statistic_discipline *disc = &statistic_discs[stat->type];
+ if (ptr) {
+ if (unlikely(disc->free))
+ disc->free(stat, ptr);
+ kfree(ptr);
+ }
+}
+
+static int statistic_free(struct statistic *stat, struct statistic_info *info)
+{
+ int cpu;
+ stat->state = STATISTIC_STATE_RELEASED;
+ if (unlikely(info->flags & STATISTIC_FLAGS_NOINCR)) {
+ statistic_free_ptr(stat, stat->pdata);
+ stat->pdata = NULL;
+ return 0;
+ }
+ for_each_cpu(cpu) {
+ statistic_free_ptr(stat, stat->pdata->ptrs[cpu]);
+ stat->pdata->ptrs[cpu] = NULL;
+ }
+ kfree(stat->pdata);
+ stat->pdata = NULL;
+ return 0;
+}
+
+static void * statistic_alloc_generic(struct statistic *stat, size_t size,
+ gfp_t flags, int node)
+{
+ return kmalloc_node(size, flags, node);
+}
+
+static void * statistic_alloc_ptr(struct statistic *stat, gfp_t flags, int node)
+{
+ struct statistic_discipline *disc = &statistic_discs[stat->type];
+ void *buf = disc->alloc(stat, disc->size, flags, node);
+ if (likely(buf))
+ statistic_reset_ptr(stat, buf);
+ return buf;
+}
+
+static int statistic_alloc(struct statistic *stat,
+ struct statistic_info *info)
+{
+ int cpu;
+ stat->age = sched_clock();
+ if (unlikely(info->flags & STATISTIC_FLAGS_NOINCR)) {
+ stat->pdata = statistic_alloc_ptr(stat, GFP_KERNEL, -1);
+ if (unlikely(!stat->pdata))
+ return -ENOMEM;
+ stat->state = STATISTIC_STATE_OFF;
+ return 0;
+ }
+ stat->pdata = kzalloc(sizeof(struct percpu_data), GFP_KERNEL);
+ if (unlikely(!stat->pdata))
+ return -ENOMEM;
+ for_each_online_cpu(cpu) {
+ stat->pdata->ptrs[cpu] = statistic_alloc_ptr(stat, GFP_KERNEL,
+ cpu_to_node(cpu));
+ if (unlikely(!stat->pdata->ptrs[cpu])) {
+ statistic_free(stat, info);
+ return -ENOMEM;
+ }
+ }
+ stat->state = STATISTIC_STATE_OFF;
+ return 0;
+}
+
+static inline int statistic_start(struct statistic *stat)
+{
+ stat->started = sched_clock();
+ stat->state = STATISTIC_STATE_ON;
+ return 0;
+}
+
+static void _statistic_barrier(void *unused)
+{
+}
+
+static inline int statistic_stop(struct statistic *stat)
+{
+ stat->stopped = sched_clock();
+ stat->state = STATISTIC_STATE_OFF;
+ /* ensures that all CPUs have ceased updating statistics */
+ smp_mb();
+ on_each_cpu(_statistic_barrier, NULL, 0, 1);
+ return 0;
+}
+
+static int statistic_transition(struct statistic *stat,
+ struct statistic_info *info,
+ enum statistic_state requested_state)
+{
+ int z = (requested_state < stat->state ? 1 : 0);
+ int retval = -EINVAL;
+
+ while (stat->state != requested_state) {
+ switch (stat->state) {
+ case STATISTIC_STATE_INVALID:
+ retval = ( z ? -EINVAL : statistic_initialise(stat) );
+ break;
+ case STATISTIC_STATE_UNCONFIGURED:
+ retval = ( z ? statistic_uninitialise(stat)
+ : statistic_define(stat) );
+ break;
+ case STATISTIC_STATE_RELEASED:
+ retval = ( z ? statistic_initialise(stat)
+ : statistic_alloc(stat, info) );
+ break;
+ case STATISTIC_STATE_OFF:
+ retval = ( z ? statistic_free(stat, info)
+ : statistic_start(stat) );
+ break;
+ case STATISTIC_STATE_ON:
+ retval = ( z ? statistic_stop(stat) : -EINVAL );
+ break;
+ }
+ if (unlikely(retval))
+ return retval;
+ }
+ return 0;
+}
+
+static int statistic_reset(struct statistic *stat, struct statistic_info *info)
+{
+ enum statistic_state prev_state = stat->state;
+ int cpu;
+
+ if (unlikely(stat->state < STATISTIC_STATE_OFF))
+ return 0;
+ statistic_transition(stat, info, STATISTIC_STATE_OFF);
+ if (unlikely(info->flags & STATISTIC_FLAGS_NOINCR))
+ statistic_reset_ptr(stat, stat->pdata);
+ else
+ for_each_cpu(cpu)
+ statistic_reset_ptr(stat, stat->pdata->ptrs[cpu]);
+ stat->age = sched_clock();
+ statistic_transition(stat, info, prev_state);
+ return 0;
+}
+
+static void statistic_merge(void *__mpriv)
+{
+ struct statistic_merge_private *mpriv = __mpriv;
+ struct statistic *stat = mpriv->stat;
+ struct statistic_discipline *disc = &statistic_discs[stat->type];
+ spin_lock(&mpriv->lock);
+ disc->merge(stat, mpriv->dst, stat->pdata->ptrs[smp_processor_id()]);
+ spin_unlock(&mpriv->lock);
+}
+
+/**
+ * statistic_set - set statistic using total numbers in (X, Y) data pair
+ * @stat: struct statistic array
+ * @i: index of statistic to be updated
+ * @value: X
+ * @total: Y
+ *
+ * The actual processing of the (X, Y) data pair is determined by the current
+ * definition applied to the statistic. See Documentation/statistics.txt.
+ *
+ * There is no distinction between a concurrency protected and unprotected
+ * statistic_set() flavour needed. statistic_set() may only
+ * be called when we pull statistic updates from exploiters. The statistics
+ * infrastructure guarantees serialisation for that. Exploiters must not
+ * intermix statistic_set() and statistic_add/inc() anyway. That is why,
+ * concurrent updates won't happen and there is no additional protection
+ * required for statistics fed through statistic_set().
+ */
+void statistic_set(struct statistic *stat, int i, s64 value, u64 total)
+{
+ struct statistic_discipline *disc = &statistic_discs[stat[i].type];
+ if (stat[i].state == STATISTIC_STATE_ON)
+ disc->set(&stat[i], value, total);
+}
+
+static struct sgrb_seg * sgrb_seg_find(struct list_head *lh, int size)
+{
+ struct sgrb_seg *seg;
+
+ /* only the last buffer, if any, may have spare bytes */
+ list_for_each_entry_reverse(seg, lh, list) {
+ if (likely((PAGE_SIZE - seg->offset) >= size))
+ return seg;
+ break;
+ }
+ seg = kzalloc(sizeof(struct sgrb_seg), GFP_KERNEL);
+ if (unlikely(!seg))
+ return NULL;
+ seg->size = PAGE_SIZE;
+ seg->address = (void*)__get_free_page(GFP_KERNEL);
+ if (unlikely(!seg->address)) {
+ kfree(seg);
+ return NULL;
+ }
+ list_add_tail(&seg->list, lh);
+ return seg;
+}
+
+static void sgrb_seg_release_all(struct list_head *lh)
+{
+ struct sgrb_seg *seg, *tmp;
+
+ list_for_each_entry_safe(seg, tmp, lh, list) {
+ list_del(&seg->list);
+ free_page((unsigned long)seg->address);
+ kfree(seg);
+ }
+}
+
+static char * statistic_state_strings[] = {
+ "undefined(BUG)",
+ "unconfigured",
+ "released",
+ "off",
+ "on",
+};
+
+static int statistic_fdef(struct statistic_interface *interface, int i,
+ struct statistic_file_private *private)
+{
+ struct statistic *stat = &interface->stat[i];
+ struct statistic_info *info = &interface->info[i];
+ struct statistic_discipline *disc = &statistic_discs[stat->type];
+ struct sgrb_seg *seg;
+ char t0[TIMESTAMP_SIZE], t1[TIMESTAMP_SIZE], t2[TIMESTAMP_SIZE];
+
+ seg = sgrb_seg_find(&private->read_seg_lh, 512);
+ if (unlikely(!seg))
+ return -ENOMEM;
+
+ seg->offset += sprintf(seg->address + seg->offset,
+ "name=%s state=%s units=%s/%s",
+ info->name, statistic_state_strings[stat->state],
+ info->x_unit, info->y_unit);
+ if (stat->state == STATISTIC_STATE_UNCONFIGURED) {
+ seg->offset += sprintf(seg->address + seg->offset, "\n");
+ return 0;
+ }
+
+ seg->offset += sprintf(seg->address + seg->offset, " type=%s",
+ disc->name);
+ if (disc->fdef)
+ seg->offset += disc->fdef(stat, seg->address + seg->offset);
+ if (stat->state == STATISTIC_STATE_RELEASED) {
+ seg->offset += sprintf(seg->address + seg->offset, "\n");
+ return 0;
+ }
+
+ nsec_to_timestamp(t0, stat->age);
+ nsec_to_timestamp(t1, stat->started);
+ nsec_to_timestamp(t2, stat->stopped);
+ seg->offset += sprintf(seg->address + seg->offset,
+ " data=%s started=%s stopped=%s\n", t0, t1, t2);
+ return 0;
+}
+
+static inline int statistic_fdata(struct statistic_interface *interface, int i,
+ struct statistic_file_private *fpriv)
+{
+ struct statistic *stat = &interface->stat[i];
+ struct statistic_info *info = &interface->info[i];
+ struct statistic_discipline *disc = &statistic_discs[stat->type];
+ struct statistic_merge_private mpriv;
+ int retval;
+
+ if (unlikely(stat->state < STATISTIC_STATE_OFF))
+ return 0;
+ if (unlikely(info->flags & STATISTIC_FLAGS_NOINCR))
+ return disc->fdata(stat, info->name, fpriv, stat->pdata);
+ mpriv.dst = statistic_alloc_ptr(stat, GFP_KERNEL, -1);
+ if (unlikely(!mpriv.dst))
+ return -ENOMEM;
+ spin_lock_init(&mpriv.lock);
+ mpriv.stat = stat;
+ on_each_cpu(statistic_merge, &mpriv, 0, 1);
+ retval = disc->fdata(stat, info->name, fpriv, mpriv.dst);
+ statistic_free_ptr(stat, mpriv.dst);
+ return retval;
+}
+
+/* cpu hotplug handling for per-cpu data */
+
+static inline int _statistic_hotcpu(struct statistic_interface *interface,
+ int i, unsigned long action, int cpu)
+{
+ struct statistic *stat = &interface->stat[i];
+ struct statistic_info *info = &interface->info[i];
+
+ if (unlikely(info->flags & STATISTIC_FLAGS_NOINCR))
+ return 0;
+ if (stat->state < STATISTIC_STATE_OFF)
+ return 0;
+ switch (action) {
+ case CPU_UP_PREPARE:
+ stat->pdata->ptrs[cpu] = statistic_alloc_ptr(stat, GFP_ATOMIC,
+ cpu_to_node(cpu));
+ break;
+ case CPU_UP_CANCELED:
+ case CPU_DEAD:
+ statistic_move_ptr(stat, stat->pdata->ptrs[cpu]);
+ statistic_free_ptr(stat, stat->pdata->ptrs[cpu]);
+ stat->pdata->ptrs[cpu] = NULL;
+ break;
+ }
+ return 0;
+}
+
+static struct list_head statistic_list;
+static struct mutex statistic_list_mutex;
+
+static int __cpuinit statistic_hotcpu(struct notifier_block *notifier,
+ unsigned long action, void *__cpu)
+{
+ int cpu = (unsigned long)__cpu, i;
+ struct statistic_interface *interface;
+
+ mutex_lock(&statistic_list_mutex);
+ list_for_each_entry(interface, &statistic_list, list)
+ for (i = 0; i < interface->number; i++)
+ _statistic_hotcpu(interface, i, action, cpu);
+ mutex_unlock(&statistic_list_mutex);
+ return NOTIFY_OK;
+}
+
+static struct notifier_block statistic_hotcpu_notifier =
+{
+ .notifier_call = statistic_hotcpu,
+};
+
+/* module startup / removal */
+
+static struct dentry *statistic_root_dir;
+
+int __init statistic_init(void)
+{
+ statistic_root_dir = debugfs_create_dir(STATISTIC_ROOT_DIR, NULL);
+ if (unlikely(!statistic_root_dir))
+ return -ENOMEM;
+ INIT_LIST_HEAD(&statistic_list);
+ mutex_init(&statistic_list_mutex);
+ register_cpu_notifier(&statistic_hotcpu_notifier);
+ return 0;
+}
+
+void __exit statistic_exit(void)
+{
+ unregister_cpu_notifier(&statistic_hotcpu_notifier);
+ debugfs_remove(statistic_root_dir);
+}
+
+/* parser used for configuring statistics */
+
+static int statistic_parse_single(struct statistic *stat,
+ struct statistic_info *info,
+ char *def, int type)
+{
+ struct statistic_discipline *disc = &statistic_discs[type];
+ int prev_state = stat->state, retval = 0;
+ char *copy;
+
+ if (disc->parse) {
+ copy = kstrdup(def, GFP_KERNEL);
+ if (unlikely(!copy))
+ return -ENOMEM;
+ retval = disc->parse(stat, info, type, copy);
+ kfree(copy);
+ } else if (type != stat->type)
+ statistic_transition(stat, info, STATISTIC_STATE_UNCONFIGURED);
+ if (!retval) {
+ stat->type = type;
+ stat->add = disc->add;
+ }
+ statistic_transition(stat, info,
+ max(prev_state, STATISTIC_STATE_RELEASED));
+ return retval;
+}
+
+static match_table_t statistic_match_type = {
+ {1, "type=%s"},
+ {9, NULL}
+};
+
+static int statistic_parse_match(struct statistic *stat,
+ struct statistic_info *info, char *def)
+{
+ int type, len;
+ char *p, *copy, *twisted;
+ substring_t args[MAX_OPT_ARGS];
+ struct statistic_discipline *disc;
+
+ if (!def)
+ def = info->defaults;
+ twisted = copy = kstrdup(def, GFP_KERNEL);
+ if (unlikely(!copy))
+ return -ENOMEM;
+ while ((p = strsep(&twisted, " ")) != NULL) {
+ if (!*p)
+ continue;
+ if (match_token(p, statistic_match_type, args) != 1)
+ continue;
+ len = (args[0].to - args[0].from) + 1;
+ for (type = 0; type < STATISTIC_TYPE_NONE; type++) {
+ disc = &statistic_discs[type];
+ if (unlikely(strncmp(disc->name, args[0].from, len)))
+ continue;
+ kfree(copy);
+ return statistic_parse_single(stat, info, def, type);
+ }
+ }
+ kfree(copy);
+ if (unlikely(stat->type == STATISTIC_TYPE_NONE))
+ return -EINVAL;
+ return statistic_parse_single(stat, info, def, stat->type);
+}
+
+static match_table_t statistic_match_common = {
+ {STATISTIC_STATE_UNCONFIGURED, "state=unconfigured"},
+ {STATISTIC_STATE_RELEASED, "state=released"},
+ {STATISTIC_STATE_OFF, "state=off"},
+ {STATISTIC_STATE_ON, "state=on"},
+ {1001, "name=%s"},
+ {1002, "data=reset"},
+ {1003, "defaults"},
+ {9999, NULL}
+};
+
+static void statistic_parse_line(struct statistic_interface *interface,
+ char *def)
+{
+ char *p, *copy, *twisted, *name = NULL;
+ substring_t args[MAX_OPT_ARGS];
+ int token, reset = 0, defaults = 0, i;
+ int state = STATISTIC_STATE_INVALID;
+ struct statistic *stat = interface->stat;
+ struct statistic_info *info = interface->info;
+
+ if (unlikely(!def))
+ return;
+ twisted = copy = kstrdup(def, GFP_KERNEL);
+ if (unlikely(!copy))
+ return;
+
+ while ((p = strsep(&twisted, " ")) != NULL) {
+ if (!*p)
+ continue;
+ token = match_token(p, statistic_match_common, args);
+ switch (token) {
+ case STATISTIC_STATE_UNCONFIGURED:
+ case STATISTIC_STATE_RELEASED:
+ case STATISTIC_STATE_OFF:
+ case STATISTIC_STATE_ON:
+ state = token;
+ break;
+ case 1001:
+ if (likely(!name))
+ name = match_strdup(&args[0]);
+ break;
+ case 1002:
+ reset = 1;
+ break;
+ case 1003:
+ defaults = 1;
+ break;
+ }
+ }
+ for (i = 0; i < interface->number; i++, stat++, info++) {
+ if (!name || (name && !strcmp(name, info->name))) {
+ if (defaults)
+ statistic_parse_match(stat, info, NULL);
+ if (name)
+ statistic_parse_match(stat, info, def);
+ if (state != STATISTIC_STATE_INVALID)
+ statistic_transition(stat, info, state);
+ if (reset)
+ statistic_reset(stat, info);
+ }
+ }
+ kfree(copy);
+ kfree(name);
+}
+
+static void statistic_parse(struct statistic_interface *interface,
+ struct list_head *line_lh, size_t line_size)
+{
+ struct sgrb_seg *seg, *tmp;
+ char *buf;
+ int offset = 0;
+
+ if (unlikely(!line_size))
+ return;
+ buf = kmalloc(line_size + 2, GFP_KERNEL);
+ if (unlikely(!buf))
+ return;
+ buf[line_size] = ' ';
+ buf[line_size + 1] = '\0';
+ list_for_each_entry_safe(seg, tmp, line_lh, list) {
+ memcpy(buf + offset, seg->address, seg->size);
+ offset += seg->size;
+ list_del(&seg->list);
+ kfree(seg);
+ }
+ statistic_parse_line(interface, buf);
+ kfree(buf);
+}
+
+/* sequential files comprising user interface */
+
+static int statistic_generic_open(struct inode *inode,
+ struct file *file, struct statistic_interface **interface,
+ struct statistic_file_private **private)
+{
+ *interface = inode->u.generic_ip;
+ BUG_ON(!interface);
+ *private = kzalloc(sizeof(struct statistic_file_private), GFP_KERNEL);
+ if (unlikely(!*private))
+ return -ENOMEM;
+ INIT_LIST_HEAD(&(*private)->read_seg_lh);
+ INIT_LIST_HEAD(&(*private)->write_seg_lh);
+ file->private_data = *private;
+ return 0;
+}
+
+static int statistic_generic_close(struct inode *inode, struct file *file)
+{
+ struct statistic_file_private *private = file->private_data;
+ BUG_ON(!private);
+ sgrb_seg_release_all(&private->read_seg_lh);
+ sgrb_seg_release_all(&private->write_seg_lh);
+ kfree(private);
+ return 0;
+}
+
+static ssize_t statistic_generic_read(struct file *file,
+ char __user *buf, size_t len, loff_t *offset)
+{
+ struct statistic_file_private *private = file->private_data;
+ struct sgrb_seg *seg;
+ size_t seg_offset, seg_residual, seg_transfer;
+ size_t transfered = 0;
+ loff_t pos = 0;
+
+ BUG_ON(!private);
+ list_for_each_entry(seg, &private->read_seg_lh, list) {
+ if (unlikely(!len))
+ break;
+ if (*offset >= pos && *offset <= (pos + seg->offset)) {
+ seg_offset = *offset - pos;
+ seg_residual = seg->offset - seg_offset;
+ seg_transfer = min(len, seg_residual);
+ if (unlikely(copy_to_user(buf + transfered,
+ seg->address + seg_offset,
+ seg_transfer)))
+ return -EFAULT;
+ transfered += seg_transfer;
+ *offset += seg_transfer;
+ pos += seg_transfer + seg_offset;
+ len -= seg_transfer;
+ } else
+ pos += seg->offset;
+ }
+ return transfered;
+}
+
+static ssize_t statistic_generic_write(struct file *file,
+ const char __user *buf, size_t len, loff_t *offset)
+{
+ struct statistic_file_private *private = file->private_data;
+ struct sgrb_seg *seg;
+ size_t seg_residual, seg_transfer;
+ size_t transfered = 0;
+
+ BUG_ON(!private);
+ if (unlikely(*offset != private->write_seg_total_size))
+ return -EPIPE;
+ while (len) {
+ seg = sgrb_seg_find(&private->write_seg_lh, 1);
+ if (unlikely(!seg))
+ return -ENOMEM;
+ seg_residual = seg->size - seg->offset;
+ seg_transfer = min(len, seg_residual);
+ if (unlikely(copy_from_user(seg->address + seg->offset,
+ buf + transfered, seg_transfer)))
+ return -EFAULT;
+ private->write_seg_total_size += seg_transfer;
+ seg->offset += seg_transfer;
+ transfered += seg_transfer;
+ *offset += seg_transfer;
+ len -= seg_transfer;
+ }
+ return transfered;
+}
+
+static int statistic_def_close(struct inode *inode, struct file *file)
+{
+ struct statistic_interface *interface = inode->u.generic_ip;
+ struct statistic_file_private *private = file->private_data;
+ struct sgrb_seg *seg, *seg_nl;
+ int offset;
+ struct list_head line_lh;
+ char *nl;
+ size_t line_size = 0;
+
+ INIT_LIST_HEAD(&line_lh);
+ list_for_each_entry(seg, &private->write_seg_lh, list) {
+ for (offset = 0; offset < seg->offset; offset += seg_nl->size) {
+ seg_nl = kmalloc(sizeof(struct sgrb_seg), GFP_KERNEL);
+ if (unlikely(!seg_nl))
+ /*
+ * FIXME:
+ * Should we omit other new settings because we
+ * could not process this line of definitions?
+ */
+ continue;
+ seg_nl->address = seg->address + offset;
+ nl = strnchr(seg_nl->address,
+ seg->offset - offset, '\n');
+ if (nl) {
+ seg_nl->offset = nl - seg_nl->address;
+ if (seg_nl->offset)
+ seg_nl->offset--;
+ } else
+ seg_nl->offset = seg->offset - offset;
+ seg_nl->size = seg_nl->offset + 1;
+ line_size += seg_nl->size;
+ list_add_tail(&seg_nl->list, &line_lh);
+ if (nl) {
+ statistic_parse(interface, &line_lh, line_size);
+ line_size = 0;
+ }
+ }
+ }
+ if (!list_empty(&line_lh))
+ statistic_parse(interface, &line_lh, line_size);
+ return statistic_generic_close(inode, file);
+}
+
+static int statistic_def_open(struct inode *inode, struct file *file)
+{
+ struct statistic_interface *interface;
+ struct statistic_file_private *private;
+ int retval = 0;
+ int i;
+
+ retval = statistic_generic_open(inode, file, &interface, &private);
+ if (unlikely(retval))
+ return retval;
+ for (i = 0; i < interface->number; i++) {
+ retval = statistic_fdef(interface, i, private);
+ if (unlikely(retval)) {
+ statistic_def_close(inode, file);
+ break;
+ }
+ }
+ return retval;
+}
+
+static int statistic_data_open(struct inode *inode, struct file *file)
+{
+ struct statistic_interface *interface;
+ struct statistic_file_private *private;
+ int retval = 0;
+ int i;
+
+ retval = statistic_generic_open(inode, file, &interface, &private);
+ if (unlikely(retval))
+ return retval;
+ if (interface->pull)
+ interface->pull(interface->pull_private);
+ for (i = 0; i < interface->number; i++) {
+ retval = statistic_fdata(interface, i, private);
+ if (unlikely(retval)) {
+ statistic_generic_close(inode, file);
+ break;
+ }
+ }
+ return retval;
+}
+
+static struct file_operations statistic_def_fops = {
+ .owner = THIS_MODULE,
+ .read = statistic_generic_read,
+ .write = statistic_generic_write,
+ .open = statistic_def_open,
+ .release = statistic_def_close,
+};
+
+static struct file_operations statistic_data_fops = {
+ .owner = THIS_MODULE,
+ .read = statistic_generic_read,
+ .open = statistic_data_open,
+ .release = statistic_generic_close,
+};
+
+/**
+ * statistic_create - setup statistics and create debugfs files
+ * @interface: struct statistic_interface provided by exploiter
+ * @name: name of debugfs directory to be created
+ *
+ * Creates a debugfs directory in "statistics" as well as the "data" and
+ * "definition" files. Then we attach setup statistics according to the
+ * definition provided by exploiter through struct statistic_interface.
+ *
+ * struct statistic_interface must have been set up prior to calling this.
+ *
+ * On success, 0 is returned.
+ *
+ * If some required memory could not be allocated, or the creation
+ * of debugfs entries failed, this routine fails, and -ENOMEM is returned.
+ */
+int statistic_create(struct statistic_interface *interface, const char *name)
+{
+ struct statistic *stat = interface->stat;
+ struct statistic_info *info = interface->info;
+ int i;
+
+ BUG_ON(!stat || !info || !interface->number);
+
+ interface->debugfs_dir =
+ debugfs_create_dir(name, statistic_root_dir);
+ if (unlikely(!interface->debugfs_dir))
+ return -ENOMEM;
+
+ interface->data_file = debugfs_create_file(
+ STATISTIC_FILENAME_DATA, S_IFREG | S_IRUSR,
+ interface->debugfs_dir, (void*)interface, &statistic_data_fops);
+ if (unlikely(!interface->data_file)) {
+ debugfs_remove(interface->debugfs_dir);
+ return -ENOMEM;
+ }
+
+ interface->def_file = debugfs_create_file(
+ STATISTIC_FILENAME_DEF, S_IFREG | S_IRUSR | S_IWUSR,
+ interface->debugfs_dir, (void*)interface, &statistic_def_fops);
+ if (unlikely(!interface->def_file)) {
+ debugfs_remove(interface->data_file);
+ debugfs_remove(interface->debugfs_dir);
+ return -ENOMEM;
+ }
+
+ for (i = 0; i < interface->number; i++, stat++, info++) {
+ statistic_transition(stat, info, STATISTIC_STATE_UNCONFIGURED);
+ statistic_parse_match(stat, info, NULL);
+ }
+
+ mutex_lock(&statistic_list_mutex);
+ list_add(&interface->list, &statistic_list);
+ mutex_unlock(&statistic_list_mutex);
+ return 0;
+}
+EXPORT_SYMBOL_GPL(statistic_create);
+
+/**
+ * statistic_remove - remove unused statistics
+ * @interface: struct statistic_interface to clean up
+ *
+ * Remove a debugfs directory in "statistics" along with its "data" and
+ * "definition" files. Removing this user interface also causes the removal
+ * of all statistics attached to the interface.
+ *
+ * The exploiter must have ceased reporting statistic data.
+ *
+ * Returns -EINVAL for attempted double removal, 0 otherwise.
+ */
+int statistic_remove(struct statistic_interface *interface)
+{
+ struct statistic *stat = interface->stat;
+ struct statistic_info *info = interface->info;
+ int i;
+
+ if (unlikely(!interface->debugfs_dir))
+ return -EINVAL;
+ mutex_lock(&statistic_list_mutex);
+ list_del(&interface->list);
+ mutex_unlock(&statistic_list_mutex);
+ for (i = 0; i < interface->number; i++, stat++, info++)
+ statistic_transition(stat, info, STATISTIC_STATE_INVALID);
+ debugfs_remove(interface->data_file);
+ debugfs_remove(interface->def_file);
+ debugfs_remove(interface->debugfs_dir);
+ interface->debugfs_dir = NULL;
+ return 0;
+}
+EXPORT_SYMBOL_GPL(statistic_remove);
+
+/* code concerned with single value statistics */
+
+static void statistic_reset_counter(struct statistic *stat, void *ptr)
+{
+ *(u64*)ptr = 0;
+}
+
+static void statistic_add_counter_inc(struct statistic *stat, int cpu,
+ s64 value, u64 incr)
+{
+ *(u64*)stat->pdata->ptrs[cpu] += incr;
+}
+
+static void statistic_add_counter_prod(struct statistic *stat, int cpu,
+ s64 value, u64 incr)
+{
+ if (unlikely(value < 0))
+ value = -value;
+ *(u64*)stat->pdata->ptrs[cpu] += value * incr;
+}
+
+static void statistic_set_counter_inc(struct statistic *stat,
+ s64 value, u64 total)
+{
+ *(u64*)stat->pdata = total;
+}
+
+static void statistic_set_counter_prod(struct statistic *stat,
+ s64 value, u64 total)
+{
+ if (unlikely(value < 0))
+ value = -value;
+ *(u64*)stat->pdata = value * total;
+}
+
+static void statistic_merge_counter(struct statistic *stat,
+ void *dst, void *src)
+{
+ *(u64*)dst += *(u64*)src;
+}
+
+static int statistic_fdata_counter(struct statistic *stat, const char *name,
+ struct statistic_file_private *fpriv,
+ void *data)
+{
+ struct sgrb_seg *seg;
+ seg = sgrb_seg_find(&fpriv->read_seg_lh, 128);
+ if (unlikely(!seg))
+ return -ENOMEM;
+ seg->offset += sprintf(seg->address + seg->offset, "%s %Lu\n",
+ name, *(unsigned long long *)data);
+ return 0;
+}
+
+/* code concerned with utilisation indicator statistic */
+
+static void statistic_reset_util(struct statistic *stat, void *ptr)
+{
+ struct statistic_entry_util *util = ptr;
+ util->num = 0;
+ util->acc = 0;
+ util->min = (~0ULL >> 1) - 1;
+ util->max = -(~0ULL >> 1) + 1;
+}
+
+static void statistic_add_util(struct statistic *stat, int cpu,
+ s64 value, u64 incr)
+{
+ struct statistic_entry_util *util = stat->pdata->ptrs[cpu];
+ util->num += incr;
+ util->acc += value * incr;
+ if (unlikely(value < util->min))
+ util->min = value;
+ if (unlikely(value > util->max))
+ util->max = value;
+}
+
+static void statistic_set_util(struct statistic *stat, s64 value, u64 total)
+{
+ struct statistic_entry_util *util;
+ util = (struct statistic_entry_util *) stat->pdata;
+ util->num = total;
+ util->acc = value * total;
+ if (unlikely(value < util->min))
+ util->min = value;
+ if (unlikely(value > util->max))
+ util->max = value;
+}
+
+static void statistic_merge_util(struct statistic *stat, void *_dst, void *_src)
+{
+ struct statistic_entry_util *dst = _dst, *src = _src;
+ dst->num += src->num;
+ dst->acc += src->acc;
+ if (unlikely(src->min < dst->min))
+ dst->min = src->min;
+ if (unlikely(src->max > dst->max))
+ dst->max = src->max;
+}
+
+static int statistic_fdata_util(struct statistic *stat, const char *name,
+ struct statistic_file_private *fpriv,
+ void *data)
+{
+ struct sgrb_seg *seg;
+ struct statistic_entry_util *util = data;
+ unsigned long long whole = 0;
+ signed long long min = 0, max = 0, decimal = 0, last_digit;
+
+ seg = sgrb_seg_find(&fpriv->read_seg_lh, 128);
+ if (unlikely(!seg))
+ return -ENOMEM;
+ if (likely(util->num)) {
+ whole = util->acc;
+ do_div(whole, util->num);
+ decimal = util->acc * 10000;
+ do_div(decimal, util->num);
+ decimal -= whole * 10000;
+ if (decimal < 0)
+ decimal = -decimal;
+ last_digit = decimal;
+ do_div(last_digit, 10);
+ last_digit = decimal - last_digit * 10;
+ if (last_digit >= 5)
+ decimal += 10;
+ do_div(decimal, 10);
+ min = util->min;
+ max = util->max;
+ }
+ seg->offset += sprintf(seg->address + seg->offset,
+ "%s %Lu %Ld %Ld.%03lld %Ld\n", name,
+ (unsigned long long)util->num,
+ (signed long long)min, whole, decimal,
+ (signed long long)max);
+ return 0;
+}
+
+/* code concerned with histogram statistics */
+
+static void * statistic_alloc_histogram(struct statistic *stat, size_t size,
+ gfp_t flags, int node)
+{
+ return kmalloc_node(size * (stat->u.histogram.last_index + 1),
+ flags, node);
+}
+
+static inline s64 statistic_histogram_calc_value_lin(struct statistic *stat,
+ int i)
+{
+ return stat->u.histogram.range_min +
+ stat->u.histogram.base_interval * i;
+}
+
+static inline s64 statistic_histogram_calc_value_log2(struct statistic *stat,
+ int i)
+{
+ return stat->u.histogram.range_min +
+ (i ? (stat->u.histogram.base_interval << (i - 1)) : 0);
+}
+
+static inline s64 statistic_histogram_calc_value(struct statistic *stat, int i)
+{
+ if (stat->type == STATISTIC_TYPE_HISTOGRAM_LIN)
+ return statistic_histogram_calc_value_lin(stat, i);
+ else
+ return statistic_histogram_calc_value_log2(stat, i);
+}
+
+static inline int statistic_histogram_calc_index_lin(struct statistic *stat,
+ s64 value)
+{
+ unsigned long long i = value - stat->u.histogram.range_min;
+ do_div(i, stat->u.histogram.base_interval);
+ return i;
+}
+
+static inline int statistic_histogram_calc_index_log2(struct statistic *stat,
+ s64 value)
+{
+ unsigned long long i;
+ for (i = 0;
+ i < stat->u.histogram.last_index &&
+ value > statistic_histogram_calc_value_log2(stat, i);
+ i++);
+ return i;
+}
+
+static inline int statistic_histogram_calc_index(struct statistic *stat,
+ s64 value)
+{
+ if (stat->type == STATISTIC_TYPE_HISTOGRAM_LIN)
+ return statistic_histogram_calc_index_lin(stat, value);
+ else
+ return statistic_histogram_calc_index_log2(stat, value);
+}
+
+static void statistic_reset_histogram(struct statistic *stat, void *ptr)
+{
+ memset(ptr, 0, (stat->u.histogram.last_index + 1) * sizeof(u64));
+}
+
+static void statistic_add_histogram_lin(struct statistic *stat, int cpu,
+ s64 value, u64 incr)
+{
+ int i = statistic_histogram_calc_index_lin(stat, value);
+ ((u64*)stat->pdata->ptrs[cpu])[i] += incr;
+}
+
+static void statistic_add_histogram_log2(struct statistic *stat, int cpu,
+ s64 value, u64 incr)
+{
+ int i = statistic_histogram_calc_index_log2(stat, value);
+ ((u64*)stat->pdata->ptrs[cpu])[i] += incr;
+}
+
+static void statistic_set_histogram_lin(struct statistic *stat,
+ s64 value, u64 total)
+{
+ int i = statistic_histogram_calc_index_lin(stat, value);
+ ((u64*)stat->pdata)[i] = total;
+}
+
+static void statistic_set_histogram_log2(struct statistic *stat,
+ s64 value, u64 total)
+{
+ int i = statistic_histogram_calc_index_log2(stat, value);
+ ((u64*)stat->pdata)[i] = total;
+}
+
+static void statistic_merge_histogram(struct statistic *stat,
+ void *_dst, void *_src)
+{
+ u64 *dst = _dst, *src = _src;
+ int i;
+ for (i = 0; i <= stat->u.histogram.last_index; i++)
+ dst[i] += src[i];
+}
+
+static inline int statistic_fdata_histogram_line(const char *name,
+ struct statistic_file_private *private,
+ const char *prefix, s64 bound, u64 hits)
+{
+ struct sgrb_seg *seg;
+ seg = sgrb_seg_find(&private->read_seg_lh, 256);
+ if (unlikely(!seg))
+ return -ENOMEM;
+ seg->offset += sprintf(seg->address + seg->offset, "%s %s%Ld %Lu\n",
+ name, prefix, (signed long long)bound,
+ (unsigned long long)hits);
+ return 0;
+}
+
+static int statistic_fdata_histogram(struct statistic *stat, const char *name,
+ struct statistic_file_private *fpriv,
+ void *data)
+{
+ int i, retval;
+ s64 bound = 0;
+ for (i = 0; i < (stat->u.histogram.last_index); i++) {
+ bound = statistic_histogram_calc_value(stat, i);
+ retval = statistic_fdata_histogram_line(name, fpriv, "<=",
+ bound, ((u64*)data)[i]);
+ if (unlikely(retval))
+ return retval;
+ }
+ return statistic_fdata_histogram_line(name, fpriv, ">",
+ bound, ((u64*)data)[i]);
+}
+
+static int statistic_fdef_histogram(struct statistic *stat, char *line)
+{
+ return sprintf(line, " range_min=%Li entries=%Li base_interval=%Lu",
+ (signed long long)stat->u.histogram.range_min,
+ (unsigned long long)(stat->u.histogram.last_index + 1),
+ (unsigned long long)stat->u.histogram.base_interval);
+}
+
+static match_table_t statistic_match_histogram = {
+ {1, "entries=%u"},
+ {2, "base_interval=%s"},
+ {3, "range_min=%s"},
+ {9, NULL}
+};
+
+static int statistic_parse_histogram(struct statistic *stat,
+ struct statistic_info *info,
+ int type, char *def)
+{
+ char *p;
+ substring_t args[MAX_OPT_ARGS];
+ int token, got_entries = 0, got_interval = 0, got_range = 0;
+ u32 entries, base_interval;
+ s64 range_min;
+
+ while ((p = strsep(&def, " ")) != NULL) {
+ if (!*p)
+ continue;
+ token = match_token(p, statistic_match_histogram, args);
+ switch (token) {
+ case 1:
+ match_int(&args[0], &entries);
+ got_entries = 1;
+ break;
+ case 2:
+ match_int(&args[0], &base_interval);
+ got_interval = 1;
+ break;
+ case 3:
+ match_s64(&args[0], &range_min, 0);
+ got_range = 1;
+ break;
+ }
+ }
+ if (unlikely(type != stat->type &&
+ !(got_entries && got_interval && got_range)))
+ return -EINVAL;
+ statistic_transition(stat, info, STATISTIC_STATE_UNCONFIGURED);
+ if (got_entries)
+ stat->u.histogram.last_index = entries - 1;
+ if (got_interval)
+ stat->u.histogram.base_interval = base_interval;
+ if (got_range)
+ stat->u.histogram.range_min = range_min;
+ return 0;
+}
+
+/* code concerned with histograms (discrete value) statistics */
+
+static void * statistic_alloc_sparse(struct statistic *stat, size_t size,
+ gfp_t flags, int node)
+{
+ struct statistic_sparse_list *slist = kmalloc_node(size, flags, node);
+ INIT_LIST_HEAD(&slist->entry_lh);
+ slist->entries_max = stat->u.sparse.entries_max;
+ return slist;
+}
+
+static void statistic_free_sparse(struct statistic *stat, void *ptr)
+{
+ struct statistic_entry_sparse *entry, *tmp;
+ struct statistic_sparse_list *slist = ptr;
+ list_for_each_entry_safe(entry, tmp, &slist->entry_lh, list) {
+ list_del(&entry->list);
+ kfree(entry);
+ }
+ slist->hits_missed = 0;
+ slist->entries = 0;
+}
+
+static inline void statistic_add_sparse_sort(struct list_head *head,
+ struct statistic_entry_sparse *entry)
+{
+ struct statistic_entry_sparse *sort =
+ list_prepare_entry(entry, head, list);
+
+ list_for_each_entry_continue_reverse(sort, head, list)
+ if (likely(sort->hits >= entry->hits))
+ break;
+ if (unlikely(sort->list.next != &entry->list &&
+ (&sort->list == head || sort->hits >= entry->hits)))
+ list_move(&entry->list, &sort->list);
+}
+
+static inline int statistic_add_sparse_new(struct statistic_sparse_list *slist,
+ s64 value, u64 incr)
+{
+ struct statistic_entry_sparse *entry;
+
+ if (unlikely(slist->entries == slist->entries_max))
+ return -ENOMEM;
+ entry = kmalloc(sizeof(struct statistic_entry_sparse), GFP_ATOMIC);
+ if (unlikely(!entry))
+ return -ENOMEM;
+ entry->value = value;
+ entry->hits = incr;
+ slist->entries++;
+ list_add_tail(&entry->list, &slist->entry_lh);
+ return 0;
+}
+
+static inline void _statistic_add_sparse(struct statistic_sparse_list *slist,
+ s64 value, u64 incr)
+{
+ struct list_head *head = &slist->entry_lh;
+ struct statistic_entry_sparse *entry;
+
+ list_for_each_entry(entry, head, list) {
+ if (likely(entry->value == value)) {
+ entry->hits += incr;
+ statistic_add_sparse_sort(head, entry);
+ return;
+ }
+ }
+ if (unlikely(statistic_add_sparse_new(slist, value, incr)))
+ slist->hits_missed += incr;
+}
+
+static void statistic_add_sparse(struct statistic *stat, int cpu,
+ s64 value, u64 incr)
+{
+ struct statistic_sparse_list *slist = stat->pdata->ptrs[cpu];
+ _statistic_add_sparse(slist, value, incr);
+}
+
+static void statistic_set_sparse(struct statistic *stat, s64 value, u64 total)
+{
+ struct statistic_sparse_list *slist = (struct statistic_sparse_list *)
+ stat->pdata;
+ struct list_head *head = &slist->entry_lh;
+ struct statistic_entry_sparse *entry;
+
+ list_for_each_entry(entry, head, list) {
+ if (likely(entry->value == value)) {
+ entry->hits = total;
+ statistic_add_sparse_sort(head, entry);
+ return;
+ }
+ }
+ if (unlikely(statistic_add_sparse_new(slist, value, total)))
+ slist->hits_missed += total;
+}
+
+static void statistic_merge_sparse(struct statistic *stat,
+ void *_dst, void *_src)
+{
+ struct statistic_sparse_list *dst = _dst, *src = _src;
+ struct statistic_entry_sparse *entry;
+ dst->hits_missed += src->hits_missed;
+ list_for_each_entry(entry, &src->entry_lh, list)
+ _statistic_add_sparse(dst, entry->value, entry->hits);
+}
+
+static int statistic_fdata_sparse(struct statistic *stat, const char *name,
+ struct statistic_file_private *fpriv,
+ void *data)
+{
+ struct sgrb_seg *seg;
+ struct statistic_sparse_list *slist = data;
+ struct statistic_entry_sparse *entry;
+
+ seg = sgrb_seg_find(&fpriv->read_seg_lh, 256);
+ if (unlikely(!seg))
+ return -ENOMEM;
+ seg->offset += sprintf(seg->address + seg->offset, "%s missed 0x%Lu\n",
+ name, (unsigned long long)slist->hits_missed);
+ list_for_each_entry(entry, &slist->entry_lh, list) {
+ seg = sgrb_seg_find(&fpriv->read_seg_lh, 256);
+ if (unlikely(!seg))
+ return -ENOMEM;
+ seg->offset += sprintf(seg->address + seg->offset,
+ "%s 0x%Lx %Lu\n", name,
+ (signed long long)entry->value,
+ (unsigned long long)entry->hits);
+ }
+ return 0;
+}
+
+static int statistic_fdef_sparse(struct statistic *stat, char *line)
+{
+ return sprintf(line, " entries=%u", stat->u.sparse.entries_max);
+}
+
+static match_table_t statistic_match_sparse = {
+ {1, "entries=%u"},
+ {9, NULL}
+};
+
+static int statistic_parse_sparse(struct statistic *stat,
+ struct statistic_info *info,
+ int type, char *def)
+{
+ char *p;
+ substring_t args[MAX_OPT_ARGS];
+
+ while ((p = strsep(&def, " ")) != NULL) {
+ if (!*p)
+ continue;
+ if (match_token(p, statistic_match_sparse, args) == 1) {
+ statistic_transition(stat, info,
+ STATISTIC_STATE_UNCONFIGURED);
+ match_int(&args[0], &stat->u.sparse.entries_max);
+ return 0;
+ }
+ }
+ return -EINVAL;
+}
+
+/* code mostly concerned with managing statistics */
+
+static struct statistic_discipline statistic_discs[] = {
+ { /* STATISTIC_TYPE_COUNTER_INC */
+ NULL,
+ statistic_alloc_generic,
+ NULL,
+ statistic_reset_counter,
+ statistic_merge_counter,
+ statistic_fdata_counter,
+ NULL,
+ statistic_add_counter_inc,
+ statistic_set_counter_inc,
+ "counter_inc", sizeof(u64)
+ },
+ { /* STATISTIC_TYPE_COUNTER_PROD */
+ NULL,
+ statistic_alloc_generic,
+ NULL,
+ statistic_reset_counter,
+ statistic_merge_counter,
+ statistic_fdata_counter,
+ NULL,
+ statistic_add_counter_prod,
+ statistic_set_counter_prod,
+ "counter_prod", sizeof(u64)
+ },
+ { /* STATISTIC_TYPE_UTIL */
+ NULL,
+ statistic_alloc_generic,
+ NULL,
+ statistic_reset_util,
+ statistic_merge_util,
+ statistic_fdata_util,
+ NULL,
+ statistic_add_util,
+ statistic_set_util,
+ "utilisation", sizeof(struct statistic_entry_util)
+ },
+ { /* STATISTIC_TYPE_HISTOGRAM_LIN */
+ statistic_parse_histogram,
+ statistic_alloc_histogram,
+ NULL,
+ statistic_reset_histogram,
+ statistic_merge_histogram,
+ statistic_fdata_histogram,
+ statistic_fdef_histogram,
+ statistic_add_histogram_lin,
+ statistic_set_histogram_lin,
+ "histogram_lin", sizeof(u64)
+ },
+ { /* STATISTIC_TYPE_HISTOGRAM_LOG2 */
+ statistic_parse_histogram,
+ statistic_alloc_histogram,
+ NULL,
+ statistic_reset_histogram,
+ statistic_merge_histogram,
+ statistic_fdata_histogram,
+ statistic_fdef_histogram,
+ statistic_add_histogram_log2,
+ statistic_set_histogram_log2,
+ "histogram_log2", sizeof(u64)
+ },
+ { /* STATISTIC_TYPE_SPARSE */
+ statistic_parse_sparse,
+ statistic_alloc_sparse,
+ statistic_free_sparse,
+ statistic_free_sparse, /* reset equals free */
+ statistic_merge_sparse,
+ statistic_fdata_sparse,
+ statistic_fdef_sparse,
+ statistic_add_sparse,
+ statistic_set_sparse,
+ "sparse", sizeof(struct statistic_sparse_list)
+ },
+ { /* STATISTIC_TYPE_NONE */ }
+};
+
+postcore_initcall(statistic_init);
+module_exit(statistic_exit);
+
+MODULE_LICENSE("GPL");
diff -Nurp a/lib/Makefile b/lib/Makefile
--- a/lib/Makefile 2006-05-19 15:44:27.000000000 +0200
+++ b/lib/Makefile 2006-05-19 16:02:23.000000000 +0200
@@ -47,6 +47,8 @@ obj-$(CONFIG_TEXTSEARCH_KMP) += ts_kmp.o
obj-$(CONFIG_TEXTSEARCH_BM) += ts_bm.o
obj-$(CONFIG_TEXTSEARCH_FSM) += ts_fsm.o

+obj-$(CONFIG_STATISTICS) += statistic.o
+
obj-$(CONFIG_SWIOTLB) += swiotlb.o

hostprogs-y := gen_crc32table
diff -Nurp a/arch/s390/Kconfig b/arch/s390/Kconfig
--- a/arch/s390/Kconfig 2006-05-19 15:44:22.000000000 +0200
+++ b/arch/s390/Kconfig 2006-05-19 16:02:23.000000000 +0200
@@ -474,8 +474,14 @@ source "drivers/net/Kconfig"

source "fs/Kconfig"

+menu "Instrumentation Support"
+
source "arch/s390/oprofile/Kconfig"

+source "lib/Kconfig.statistic"
+
+endmenu
+
source "arch/s390/Kconfig.debug"

source "security/Kconfig"
diff -Nurp a/lib/Kconfig.statistic b/lib/Kconfig.statistic
--- a/lib/Kconfig.statistic 1970-01-01 01:00:00.000000000 +0100
+++ b/lib/Kconfig.statistic 2006-05-19 16:02:23.000000000 +0200
@@ -0,0 +1,11 @@
+config STATISTICS
+ bool "Statistics infrastructure"
+ depends on DEBUG_FS
+ help
+ The statistics infrastructure provides a debug-fs based user interface
+ for statistics of kernel components, that is, usually device drivers.
+ Statistics are available for components that have been instrumented to
+ feed data into the statistics infrastructure.
+ This feature is useful for performance measurements or performance
+ debugging.
+ If in doubt, say "N".
diff -Nurp a/arch/s390/oprofile/Kconfig b/arch/s390/oprofile/Kconfig
--- a/arch/s390/oprofile/Kconfig 2006-03-20 06:53:29.000000000 +0100
+++ b/arch/s390/oprofile/Kconfig 2006-05-19 16:02:23.000000000 +0200
@@ -1,6 +1,3 @@
-
-menu "Profiling support"
-
config PROFILING
bool "Profiling support"
help
@@ -18,5 +15,3 @@ config OPROFILE

If unsure, say N.

-endmenu
-
diff -Nurp a/MAINTAINERS b/MAINTAINERS
--- a/MAINTAINERS 2006-05-19 15:44:32.000000000 +0200
+++ b/MAINTAINERS 2006-05-19 16:02:23.000000000 +0200
@@ -2633,6 +2633,13 @@ STARMODE RADIO IP (STRIP) PROTOCOL DRIVE
W: http://mosquitonet.Stanford.EDU/strip.html
S: Unsupported ?

+STATISTICS INFRASTRUCTURE
+P: Martin Peschke
+M: [email protected]
+M: [email protected]
+W: http://www.ibm.com/developerworks/linux/linux390/
+S: Supported
+
STRADIS MPEG-2 DECODER DRIVER
P: Nathan Laredo
M: [email protected]



2006-05-24 22:54:52

by Andrew Morton

[permalink] [raw]
Subject: Re: [Patch 5/6] statistics infrastructure

Martin Peschke <[email protected]> wrote:
>

It would be great to have a non-s390 exploiter of this code. So more
people could try it out. Is that much work?

One assumes that there's some subsytem or driver which has a real-life need
for such instrumentation, although I don't know which one that would be.
(And if there is no such subsystem then that's rather a black mark for
merging all this code!)

Thoughts?

> ...
>
> +struct statistic_discipline {
> + int (*parse)(struct statistic *, struct statistic_info *, int, char *);
> + void* (*alloc)(struct statistic *, size_t, gfp_t, int);
> + void (*free)(struct statistic *, void *);
> + void (*reset)(struct statistic *, void *);
> + void (*merge)(struct statistic *, void *, void*);
> + int (*fdata)(struct statistic *, const char *,
> + struct statistic_file_private *, void *);
> + int (*fdef)(struct statistic *, char *);
> + void (*add)(struct statistic *, int, s64, u64);
> + void (*set)(struct statistic *, s64, u64);
> + char *name;
> + size_t size;
> +};

This practice of omitting the variable names drives me up the wall, sorry.
Look at the definition of `add' then fall down dazed and confused.

This is particularly true of these function-pointer style declarations.
For example, do:

$EDITOR -t aio_read

and you end up here:

ssize_t (*aio_read) (struct kiocb *, char __user *, size_t, loff_t);

which is uninformative. You have to go and hunt down an instance of an
aio_read() implementation to actually be sure what those args are doing.

So I think putting the nicely-chosen variable names in there is quite
helpful.

> +#ifdef CONFIG_STATISTICS
> +
> +extern int statistic_create(struct statistic_interface *, const char *);
> +extern int statistic_remove(struct statistic_interface *);
> +
> +/**
> + * statistic_add - update statistic with incremental data in (X, Y) pair
> + * @stat: struct statistic array
> + * @i: index of statistic to be updated
> + * @value: X
> + * @incr: Y
> + *
> + * The actual processing of the (X, Y) data pair is determined by the current
> + * the definition applied to the statistic. See Documentation/statistics.txt.
> + *
> + * This variant takes care of protecting per-cpu data. It is preferred whenever
> + * exploiters don't update several statistics of the same entity in one go.
> + */
> +static inline void statistic_add(struct statistic *stat, int i,
> + s64 value, u64 incr)
> +{
> + unsigned long flags;
> + local_irq_save(flags);
> + if (stat[i].state == STATISTIC_STATE_ON)
> + stat[i].add(&stat[i], smp_processor_id(), value, incr);
> + local_irq_restore(flags);
> +}

afaict this isn't actually used?

If it is, and assuming this is only accessed via a function pointer (the
mysterious `add' method) then there's not a lot of point in inlining it.

Except if this code really isn't called, then inlining it will avoid having
an unused piece of code in vmlinux.

But if it _is_ used, and it has multiple users then we end up with multiple
copies in vmlinux.

So what's up with that?

And elsewhere we have:

> +static inline void statistic_add(struct statistic *stat, int i,
> + s64 value, u64 incr)
> +{
> +}
> +

Do we expect this to have any callers if !CONFIG_STATISTICS?


> +static int statistic_free(struct statistic *stat, struct statistic_info *info)
> +{
> + int cpu;
> + stat->state = STATISTIC_STATE_RELEASED;
> + if (unlikely(info->flags & STATISTIC_FLAGS_NOINCR)) {
> + statistic_free_ptr(stat, stat->pdata);
> + stat->pdata = NULL;
> + return 0;
> + }
> + for_each_cpu(cpu) {

for_each_cpu() is on death row. Replace it with for_each_possible_cpu().
If that is indeed appropriate - perhaps you meant online_cpu, or
present_cpu.

> +static void * statistic_alloc_generic(struct statistic *stat, size_t size,
^

unwelcome space ;)

> +static int statistic_alloc(struct statistic *stat,
> + struct statistic_info *info)
> +{
> + int cpu;
> + stat->age = sched_clock();

argh. Didn't we end up finding a way to avoid this?

At the least, we should have statistics_clock(), or nsec_clock(), or
something which is decoupled from this low-level scheduler-internal thing,
and which architectures can implement (vis attribute-weak) if they have a
preferred/better/more-accurate alternative.


> + if (unlikely(info->flags & STATISTIC_FLAGS_NOINCR)) {
> + stat->pdata = statistic_alloc_ptr(stat, GFP_KERNEL, -1);
> + if (unlikely(!stat->pdata))
> + return -ENOMEM;
> + stat->state = STATISTIC_STATE_OFF;
> + return 0;
> + }
> + stat->pdata = kzalloc(sizeof(struct percpu_data), GFP_KERNEL);
> + if (unlikely(!stat->pdata))
> + return -ENOMEM;
> + for_each_online_cpu(cpu) {

hmn. Now we're using only the online CPUs. Ah, OK, you have a cpu-hotplug
handler.

> +static int statistic_transition(struct statistic *stat,
> + struct statistic_info *info,
> + enum statistic_state requested_state)
> +{
> + int z = (requested_state < stat->state ? 1 : 0);
> + int retval = -EINVAL;
> +
> + while (stat->state != requested_state) {
> + switch (stat->state) {
> + case STATISTIC_STATE_INVALID:
> + retval = ( z ? -EINVAL : statistic_initialise(stat) );
> + break;
> + case STATISTIC_STATE_UNCONFIGURED:
> + retval = ( z ? statistic_uninitialise(stat)
> + : statistic_define(stat) );
> + break;
> + case STATISTIC_STATE_RELEASED:
> + retval = ( z ? statistic_initialise(stat)
> + : statistic_alloc(stat, info) );
> + break;
> + case STATISTIC_STATE_OFF:
> + retval = ( z ? statistic_free(stat, info)
> + : statistic_start(stat) );
> + break;
> + case STATISTIC_STATE_ON:
> + retval = ( z ? statistic_stop(stat) : -EINVAL );
> + break;

Lots of unneeded parentheses there.

> +static int statistic_reset(struct statistic *stat, struct statistic_info *info)
> +{
> + enum statistic_state prev_state = stat->state;
> + int cpu;
> +
> + if (unlikely(stat->state < STATISTIC_STATE_OFF))
> + return 0;
> + statistic_transition(stat, info, STATISTIC_STATE_OFF);
> + if (unlikely(info->flags & STATISTIC_FLAGS_NOINCR))
> + statistic_reset_ptr(stat, stat->pdata);
> + else
> + for_each_cpu(cpu)

for_each_possible_cpu() (maybe)

> +static inline int statistic_fdata(struct statistic_interface *interface, int i,
> + struct statistic_file_private *fpriv)
> +{
> + struct statistic *stat = &interface->stat[i];
> + struct statistic_info *info = &interface->info[i];
> + struct statistic_discipline *disc = &statistic_discs[stat->type];
> + struct statistic_merge_private mpriv;
> + int retval;
> +
> + if (unlikely(stat->state < STATISTIC_STATE_OFF))
> + return 0;
> + if (unlikely(info->flags & STATISTIC_FLAGS_NOINCR))
> + return disc->fdata(stat, info->name, fpriv, stat->pdata);
> + mpriv.dst = statistic_alloc_ptr(stat, GFP_KERNEL, -1);
> + if (unlikely(!mpriv.dst))
> + return -ENOMEM;
> + spin_lock_init(&mpriv.lock);
> + mpriv.stat = stat;
> + on_each_cpu(statistic_merge, &mpriv, 0, 1);
> + retval = disc->fdata(stat, info->name, fpriv, mpriv.dst);
> + statistic_free_ptr(stat, mpriv.dst);
> + return retval;
> +}

You do like that `inline' thingy ;)

> +/* cpu hotplug handling for per-cpu data */
> +
> +static inline int _statistic_hotcpu(struct statistic_interface *interface,
> + int i, unsigned long action, int cpu)
> +{
> + struct statistic *stat = &interface->stat[i];
> + struct statistic_info *info = &interface->info[i];
> +
> + if (unlikely(info->flags & STATISTIC_FLAGS_NOINCR))
> + return 0;
> + if (stat->state < STATISTIC_STATE_OFF)
> + return 0;
> + switch (action) {
> + case CPU_UP_PREPARE:
> + stat->pdata->ptrs[cpu] = statistic_alloc_ptr(stat, GFP_ATOMIC,
> + cpu_to_node(cpu));
> + break;

So this allocation can fail. Does all the other code handle that? If not,
we should fail the CPU bringup.

Dangit, this is inlined as well. It makes oops-tracing really hard :(

> +{
> + statistic_root_dir = debugfs_create_dir(STATISTIC_ROOT_DIR, NULL);
> + if (unlikely(!statistic_root_dir))
> + return -ENOMEM;
> + INIT_LIST_HEAD(&statistic_list);
> + mutex_init(&statistic_list_mutex);
> + register_cpu_notifier(&statistic_hotcpu_notifier);

Actually, this can fail too (well, actually it can't, but the API suggests
it can).

> + int offset;
> + struct list_head line_lh;
> + char *nl;
> + size_t line_size = 0;
> +
> + INIT_LIST_HEAD(&line_lh);

LIST_HEAD(line_lh);


> +
> +/* code concerned with utilisation indicator statistic */
> +
> +static void statistic_reset_util(struct statistic *stat, void *ptr)
> +{
> + struct statistic_entry_util *util = ptr;
> + util->num = 0;
> + util->acc = 0;
> + util->min = (~0ULL >> 1) - 1;
> + util->max = -(~0ULL >> 1) + 1;
> +}

`min' is a large positive number and `max' is a large negative one. Is that
right?

`min' gets 0x7ffffffffffffffe, which seems to be off-by-one.

Consider using LLONG_MAX and friends.

> +static void statistic_add_util(struct statistic *stat, int cpu,
> + s64 value, u64 incr)
> +{
> + struct statistic_entry_util *util = stat->pdata->ptrs[cpu];
> + util->num += incr;
> + util->acc += value * incr;
> + if (unlikely(value < util->min))
> + util->min = value;
> + if (unlikely(value > util->max))
> + util->max = value;
> +}

ah, I get it! `min' isn't minimum-allowable. It's
minimum-it-has-ever-been. Makes sense now.

> +static int statistic_fdata_util(struct statistic *stat, const char *name,
> + struct statistic_file_private *fpriv,
> + void *data)
> +{
> + struct sgrb_seg *seg;
> + struct statistic_entry_util *util = data;
> + unsigned long long whole = 0;
> + signed long long min = 0, max = 0, decimal = 0, last_digit;
> +
> + seg = sgrb_seg_find(&fpriv->read_seg_lh, 128);
> + if (unlikely(!seg))
> + return -ENOMEM;
> + if (likely(util->num)) {
> + whole = util->acc;
> + do_div(whole, util->num);
> + decimal = util->acc * 10000;
> + do_div(decimal, util->num);
> + decimal -= whole * 10000;
> + if (decimal < 0)
> + decimal = -decimal;
> + last_digit = decimal;
> + do_div(last_digit, 10);
> + last_digit = decimal - last_digit * 10;
> + if (last_digit >= 5)
> + decimal += 10;
> + do_div(decimal, 10);
> + min = util->min;
> + max = util->max;
> + }
> + seg->offset += sprintf(seg->address + seg->offset,
> + "%s %Lu %Ld %Ld.%03lld %Ld\n", name,
> + (unsigned long long)util->num,
> + (signed long long)min, whole, decimal,
> + (signed long long)max);

There's no need to cast `min' and `max' here. A cast would be needed if
they were u64/s64.

> +
> +static inline int statistic_add_sparse_new(struct statistic_sparse_list *slist,
> + s64 value, u64 incr)
> +{
> + struct statistic_entry_sparse *entry;
> +
> + if (unlikely(slist->entries == slist->entries_max))
> + return -ENOMEM;
> + entry = kmalloc(sizeof(struct statistic_entry_sparse), GFP_ATOMIC);
> + if (unlikely(!entry))
> + return -ENOMEM;
> + entry->value = value;
> + entry->hits = incr;
> + slist->entries++;
> + list_add_tail(&entry->list, &slist->entry_lh);
> + return 0;
> +}
>
> +static inline void _statistic_add_sparse(struct statistic_sparse_list *slist,
> + s64 value, u64 incr)
> +{
> + struct list_head *head = &slist->entry_lh;
> + struct statistic_entry_sparse *entry;
> +
> + list_for_each_entry(entry, head, list) {
> + if (likely(entry->value == value)) {
> + entry->hits += incr;
> + statistic_add_sparse_sort(head, entry);
> + return;
> + }
> + }
> + if (unlikely(statistic_add_sparse_new(slist, value, incr)))
> + slist->hits_missed += incr;
> +}

I hereby revoke your inlining license.

> +static void statistic_set_sparse(struct statistic *stat, s64 value, u64 total)
> +{
> + struct statistic_sparse_list *slist = (struct statistic_sparse_list *)
> + stat->pdata;

Hang on, what's happening here? statistic.pdata is `struct percpu_data *'.
That's

struct percpu_data {
void *ptrs[NR_CPUS];
};

How can we cast that to a statistic_sparse_list* and then start playing
with it? We're supposed to use per_cpu_ptr() to get at the actual data.


2006-05-25 08:09:54

by Nikita Danilov

[permalink] [raw]
Subject: Re: [Patch 5/6] statistics infrastructure

Martin Peschke writes:
> This patch adds statistics infrastructure as common code.
>

[...]

> +
> +static void statistic_add_util(struct statistic *stat, int cpu,
> + s64 value, u64 incr)
> +{
> + struct statistic_entry_util *util = stat->pdata->ptrs[cpu];
> + util->num += incr;
> + util->acc += value * incr;
> + if (unlikely(value < util->min))
> + util->min = value;
> + if (unlikely(value > util->max))
> + util->max = value;

One useful aggregate that can be calculated here is a standard
deviation. Something like

util->acc_sq += value * incr * value * incr; /* sum of squares */

And in statistic_fdata_util() squared standard deviation is

std_dev = util->acc_sq;
/*
* Difference of averaged square and squared average.
*/
std_dev = do_div(std_dev, util->num) - whole * whole;

> +}

Nikita.

2006-05-29 22:17:53

by Martin Peschke

[permalink] [raw]
Subject: [Patch] statistics infrastructure - update 1

Andrew, please apply.

changelog:
- nsec_to_timestamp: u64 is preferred type for kernel's nanoseconds
- improve readability of function prototypes
- fail cpu bringup if out of memory
- use LLONG* constants and fix off-by-one
- nifty list head initialisation
- remove unneeded cast
- remove unneeded parenthesis
- remove unwelcome spaces
- be more careful with inlining
- for_each_cpu() is on death row

Signed-off-by: Martin Peschke <[email protected]>
---

include/linux/jiffies.h | 2
include/linux/statistic.h | 22 ++++----
lib/statistic.c | 122 +++++++++++++++++++++-------------------------
3 files changed, 71 insertions(+), 75 deletions(-)

--- a/include/linux/jiffies.h 24 May 2006 09:28:36 -0000 1.12
+++ b/include/linux/jiffies.h 26 May 2006 15:35:47 -0000 1.13
@@ -447,7 +447,7 @@
return x;
}

-static inline int nsec_to_timestamp(char *s, unsigned long long t)
+static inline int nsec_to_timestamp(char *s, u64 t)
{
unsigned long nsec_rem = do_div(t, NSEC_PER_SEC);
return sprintf(s, "[%5lu.%06lu]", (unsigned long)t,
--- a/include/linux/statistic.h 19 May 2006 11:08:16 -0000 1.22
+++ b/include/linux/statistic.h 29 May 2006 20:12:42 -0000
@@ -156,16 +156,18 @@
* A data area of a data processing mode always has to look the same.
*/
struct statistic_discipline {
- int (*parse)(struct statistic *, struct statistic_info *, int, char *);
- void* (*alloc)(struct statistic *, size_t, gfp_t, int);
- void (*free)(struct statistic *, void *);
- void (*reset)(struct statistic *, void *);
- void (*merge)(struct statistic *, void *, void*);
- int (*fdata)(struct statistic *, const char *,
- struct statistic_file_private *, void *);
- int (*fdef)(struct statistic *, char *);
- void (*add)(struct statistic *, int, s64, u64);
- void (*set)(struct statistic *, s64, u64);
+ int (*parse)(struct statistic * stat, struct statistic_info *info,
+ int type, char *def);
+ void* (*alloc)(struct statistic *stat, size_t size, gfp_t flags,
+ int node);
+ void (*free)(struct statistic *stat, void *ptr);
+ void (*reset)(struct statistic *stat, void *ptr);
+ void (*merge)(struct statistic *stat, void *dst, void *src);
+ int (*fdata)(struct statistic *stat, const char *name,
+ struct statistic_file_private *fpriv, void *data);
+ int (*fdef)(struct statistic *stat, char *line);
+ void (*add)(struct statistic *stat, int cpu, s64 value, u64 incr);
+ void (*set)(struct statistic *stat, s64 value, u64 total);
char *name;
size_t size;
};
--- a/lib/statistic.c 19 May 2006 14:12:58 -0000 1.36
+++ b/lib/statistic.c 29 May 2006 20:12:42 -0000
@@ -64,20 +64,20 @@

static struct statistic_discipline statistic_discs[];

-static inline int statistic_initialise(struct statistic *stat)
+static int statistic_initialise(struct statistic *stat)
{
stat->type = STATISTIC_TYPE_NONE;
stat->state = STATISTIC_STATE_UNCONFIGURED;
return 0;
}

-static inline int statistic_uninitialise(struct statistic *stat)
+static int statistic_uninitialise(struct statistic *stat)
{
stat->state = STATISTIC_STATE_INVALID;
return 0;
}

-static inline int statistic_define(struct statistic *stat)
+static int statistic_define(struct statistic *stat)
{
if (stat->type == STATISTIC_TYPE_NONE)
return -EINVAL;
@@ -85,14 +85,14 @@
return 0;
}

-static inline void statistic_reset_ptr(struct statistic *stat, void *ptr)
+static void statistic_reset_ptr(struct statistic *stat, void *ptr)
{
struct statistic_discipline *disc = &statistic_discs[stat->type];
if (ptr)
disc->reset(stat, ptr);
}

-static inline void statistic_move_ptr(struct statistic *stat, void *src)
+static void statistic_move_ptr(struct statistic *stat, void *src)
{
struct statistic_discipline *disc = &statistic_discs[stat->type];
unsigned long flags;
@@ -101,7 +101,7 @@
local_irq_restore(flags);
}

-static inline void statistic_free_ptr(struct statistic *stat, void *ptr)
+static void statistic_free_ptr(struct statistic *stat, void *ptr)
{
struct statistic_discipline *disc = &statistic_discs[stat->type];
if (ptr) {
@@ -120,7 +120,7 @@
stat->pdata = NULL;
return 0;
}
- for_each_cpu(cpu) {
+ for_each_possible_cpu(cpu) {
statistic_free_ptr(stat, stat->pdata->ptrs[cpu]);
stat->pdata->ptrs[cpu] = NULL;
}
@@ -129,13 +129,13 @@
return 0;
}

-static void * statistic_alloc_generic(struct statistic *stat, size_t size,
- gfp_t flags, int node)
+static void *statistic_alloc_generic(struct statistic *stat, size_t size,
+ gfp_t flags, int node)
{
return kmalloc_node(size, flags, node);
}

-static void * statistic_alloc_ptr(struct statistic *stat, gfp_t flags, int node)
+static void *statistic_alloc_ptr(struct statistic *stat, gfp_t flags, int node)
{
struct statistic_discipline *disc = &statistic_discs[stat->type];
void *buf = disc->alloc(stat, disc->size, flags, node);
@@ -171,7 +171,7 @@
return 0;
}

-static inline int statistic_start(struct statistic *stat)
+static int statistic_start(struct statistic *stat)
{
stat->started = sched_clock();
stat->state = STATISTIC_STATE_ON;
@@ -182,7 +182,7 @@
{
}

-static inline int statistic_stop(struct statistic *stat)
+static int statistic_stop(struct statistic *stat)
{
stat->stopped = sched_clock();
stat->state = STATISTIC_STATE_OFF;
@@ -196,28 +196,28 @@
struct statistic_info *info,
enum statistic_state requested_state)
{
- int z = (requested_state < stat->state ? 1 : 0);
+ int z = requested_state < stat->state ? 1 : 0;
int retval = -EINVAL;

while (stat->state != requested_state) {
switch (stat->state) {
case STATISTIC_STATE_INVALID:
- retval = ( z ? -EINVAL : statistic_initialise(stat) );
+ retval = z ? -EINVAL : statistic_initialise(stat);
break;
case STATISTIC_STATE_UNCONFIGURED:
- retval = ( z ? statistic_uninitialise(stat)
- : statistic_define(stat) );
+ retval = z ? statistic_uninitialise(stat)
+ : statistic_define(stat);
break;
case STATISTIC_STATE_RELEASED:
- retval = ( z ? statistic_initialise(stat)
- : statistic_alloc(stat, info) );
+ retval = z ? statistic_initialise(stat)
+ : statistic_alloc(stat, info);
break;
case STATISTIC_STATE_OFF:
- retval = ( z ? statistic_free(stat, info)
- : statistic_start(stat) );
+ retval = z ? statistic_free(stat, info)
+ : statistic_start(stat);
break;
case STATISTIC_STATE_ON:
- retval = ( z ? statistic_stop(stat) : -EINVAL );
+ retval = z ? statistic_stop(stat) : -EINVAL;
break;
}
if (unlikely(retval))
@@ -237,7 +237,7 @@
if (unlikely(info->flags & STATISTIC_FLAGS_NOINCR))
statistic_reset_ptr(stat, stat->pdata);
else
- for_each_cpu(cpu)
+ for_each_possible_cpu(cpu)
statistic_reset_ptr(stat, stat->pdata->ptrs[cpu]);
stat->age = sched_clock();
statistic_transition(stat, info, prev_state);
@@ -279,7 +279,7 @@
disc->set(&stat[i], value, total);
}

-static struct sgrb_seg * sgrb_seg_find(struct list_head *lh, int size)
+static struct sgrb_seg *sgrb_seg_find(struct list_head *lh, int size)
{
struct sgrb_seg *seg;

@@ -313,7 +313,7 @@
}
}

-static char * statistic_state_strings[] = {
+static char *statistic_state_strings[] = {
"undefined(BUG)",
"unconfigured",
"released",
@@ -360,8 +360,8 @@
return 0;
}

-static inline int statistic_fdata(struct statistic_interface *interface, int i,
- struct statistic_file_private *fpriv)
+static int statistic_fdata(struct statistic_interface *interface, int i,
+ struct statistic_file_private *fpriv)
{
struct statistic *stat = &interface->stat[i];
struct statistic_info *info = &interface->info[i];
@@ -386,8 +386,8 @@

/* cpu hotplug handling for per-cpu data */

-static inline int _statistic_hotcpu(struct statistic_interface *interface,
- int i, unsigned long action, int cpu)
+static int _statistic_hotcpu(struct statistic_interface *interface,
+ int i, unsigned long action, int cpu)
{
struct statistic *stat = &interface->stat[i];
struct statistic_info *info = &interface->info[i];
@@ -400,6 +400,8 @@
case CPU_UP_PREPARE:
stat->pdata->ptrs[cpu] = statistic_alloc_ptr(stat, GFP_ATOMIC,
cpu_to_node(cpu));
+ if (!stat->pdata->ptrs[cpu])
+ return -ENOMEM;
break;
case CPU_UP_CANCELED:
case CPU_DEAD:
@@ -417,15 +419,19 @@
static int __cpuinit statistic_hotcpu(struct notifier_block *notifier,
unsigned long action, void *__cpu)
{
- int cpu = (unsigned long)__cpu, i;
+ int cpu = (unsigned long)__cpu, i, retval = 0;
struct statistic_interface *interface;

mutex_lock(&statistic_list_mutex);
list_for_each_entry(interface, &statistic_list, list)
- for (i = 0; i < interface->number; i++)
- _statistic_hotcpu(interface, i, action, cpu);
+ for (i = 0; i < interface->number; i++) {
+ retval = _statistic_hotcpu(interface, i, action, cpu);
+ if (retval)
+ goto unlock;
+ }
+unlock:
mutex_unlock(&statistic_list_mutex);
- return NOTIFY_OK;
+ return (retval ? NOTIFY_BAD : NOTIFY_OK);
}

static struct notifier_block statistic_hotcpu_notifier =
@@ -702,11 +708,10 @@
struct statistic_file_private *private = file->private_data;
struct sgrb_seg *seg, *seg_nl;
int offset;
- struct list_head line_lh;
+ LIST_HEAD(line_lh);
char *nl;
size_t line_size = 0;

- INIT_LIST_HEAD(&line_lh);
list_for_each_entry(seg, &private->write_seg_lh, list) {
for (offset = 0; offset < seg->offset; offset += seg_nl->size) {
seg_nl = kmalloc(sizeof(struct sgrb_seg), GFP_KERNEL);
@@ -896,7 +901,7 @@
}

static void statistic_add_counter_inc(struct statistic *stat, int cpu,
- s64 value, u64 incr)
+ s64 value, u64 incr)
{
*(u64*)stat->pdata->ptrs[cpu] += incr;
}
@@ -949,8 +954,8 @@
struct statistic_entry_util *util = ptr;
util->num = 0;
util->acc = 0;
- util->min = (~0ULL >> 1) - 1;
- util->max = -(~0ULL >> 1) + 1;
+ util->min = LLONG_MAX;
+ util->max = LLONG_MIN;
}

static void statistic_add_util(struct statistic *stat, int cpu,
@@ -1020,15 +1025,14 @@
seg->offset += sprintf(seg->address + seg->offset,
"%s %Lu %Ld %Ld.%03lld %Ld\n", name,
(unsigned long long)util->num,
- (signed long long)min, whole, decimal,
- (signed long long)max);
+ min, whole, decimal, max);
return 0;
}

/* code concerned with histogram statistics */

-static void * statistic_alloc_histogram(struct statistic *stat, size_t size,
- gfp_t flags, int node)
+static void *statistic_alloc_histogram(struct statistic *stat, size_t size,
+ gfp_t flags, int node)
{
return kmalloc_node(size * (stat->u.histogram.last_index + 1),
flags, node);
@@ -1048,7 +1052,7 @@
(i ? (stat->u.histogram.base_interval << (i - 1)) : 0);
}

-static inline s64 statistic_histogram_calc_value(struct statistic *stat, int i)
+static s64 statistic_histogram_calc_value(struct statistic *stat, int i)
{
if (stat->type == STATISTIC_TYPE_HISTOGRAM_LIN)
return statistic_histogram_calc_value_lin(stat, i);
@@ -1056,16 +1060,15 @@
return statistic_histogram_calc_value_log2(stat, i);
}

-static inline int statistic_histogram_calc_index_lin(struct statistic *stat,
- s64 value)
+static int statistic_histogram_calc_index_lin(struct statistic *stat, s64 value)
{
unsigned long long i = value - stat->u.histogram.range_min;
do_div(i, stat->u.histogram.base_interval);
return i;
}

-static inline int statistic_histogram_calc_index_log2(struct statistic *stat,
- s64 value)
+static int statistic_histogram_calc_index_log2(struct statistic *stat,
+ s64 value)
{
unsigned long long i;
for (i = 0;
@@ -1075,15 +1078,6 @@
return i;
}

-static inline int statistic_histogram_calc_index(struct statistic *stat,
- s64 value)
-{
- if (stat->type == STATISTIC_TYPE_HISTOGRAM_LIN)
- return statistic_histogram_calc_index_lin(stat, value);
- else
- return statistic_histogram_calc_index_log2(stat, value);
-}
-
static void statistic_reset_histogram(struct statistic *stat, void *ptr)
{
memset(ptr, 0, (stat->u.histogram.last_index + 1) * sizeof(u64));
@@ -1126,7 +1120,7 @@
dst[i] += src[i];
}

-static inline int statistic_fdata_histogram_line(const char *name,
+static int statistic_fdata_histogram_line(const char *name,
struct statistic_file_private *private,
const char *prefix, s64 bound, u64 hits)
{
@@ -1216,8 +1210,8 @@

/* code concerned with histograms (discrete value) statistics */

-static void * statistic_alloc_sparse(struct statistic *stat, size_t size,
- gfp_t flags, int node)
+static void *statistic_alloc_sparse(struct statistic *stat, size_t size,
+ gfp_t flags, int node)
{
struct statistic_sparse_list *slist = kmalloc_node(size, flags, node);
INIT_LIST_HEAD(&slist->entry_lh);
@@ -1237,8 +1231,8 @@
slist->entries = 0;
}

-static inline void statistic_add_sparse_sort(struct list_head *head,
- struct statistic_entry_sparse *entry)
+static void statistic_add_sparse_sort(struct list_head *head,
+ struct statistic_entry_sparse *entry)
{
struct statistic_entry_sparse *sort =
list_prepare_entry(entry, head, list);
@@ -1251,8 +1245,8 @@
list_move(&entry->list, &sort->list);
}

-static inline int statistic_add_sparse_new(struct statistic_sparse_list *slist,
- s64 value, u64 incr)
+static int statistic_add_sparse_new(struct statistic_sparse_list *slist,
+ s64 value, u64 incr)
{
struct statistic_entry_sparse *entry;

@@ -1268,8 +1262,8 @@
return 0;
}

-static inline void _statistic_add_sparse(struct statistic_sparse_list *slist,
- s64 value, u64 incr)
+static void _statistic_add_sparse(struct statistic_sparse_list *slist,
+ s64 value, u64 incr)
{
struct list_head *head = &slist->entry_lh;
struct statistic_entry_sparse *entry;




2006-05-29 22:17:26

by Martin Peschke

[permalink] [raw]
Subject: Re: [Patch 5/6] statistics infrastructure

Andrew, please pick up the update that comes in a separate mail.
It contains all changes as indicated below.

The update patch doesn't address parts of your feedback.
For details see also below, please.

Thanks, Martin

On Wed, 2006-05-24 at 15:57 -0700, Andrew Morton wrote:
> Martin Peschke <[email protected]> wrote:
> >
>
> It would be great to have a non-s390 exploiter of this code. So more
> people could try it out. Is that much work?
>
> One assumes that there's some subsytem or driver which has a real-life need
> for such instrumentation, although I don't know which one that would be.
> (And if there is no such subsystem then that's rather a black mark for
> merging all this code!)
>
> Thoughts?

Our initial requirement has been to provide SCSI statistics. I have got
a patch for that (latencies etc.). Unfortunately, it requires more work.
I have found out that SCSI device creation doesn't guarantee process
context. Setup up statistics might call schedule() (see
debugs_create_file).

I am going to discuss the issue with linux-scsi. I have seen patches
that seem to move SCSI device scanning to a kernel thread, which
might be welcome help.

> > ...
> >
> > +struct statistic_discipline {
> > + int (*parse)(struct statistic *, struct statistic_info *, int, char *);
> > + void* (*alloc)(struct statistic *, size_t, gfp_t, int);
> > + void (*free)(struct statistic *, void *);
> > + void (*reset)(struct statistic *, void *);
> > + void (*merge)(struct statistic *, void *, void*);
> > + int (*fdata)(struct statistic *, const char *,
> > + struct statistic_file_private *, void *);
> > + int (*fdef)(struct statistic *, char *);
> > + void (*add)(struct statistic *, int, s64, u64);
> > + void (*set)(struct statistic *, s64, u64);
> > + char *name;
> > + size_t size;
> > +};
>
> This practice of omitting the variable names drives me up the wall, sorry.
> Look at the definition of `add' then fall down dazed and confused.
>
> This is particularly true of these function-pointer style declarations.
> For example, do:
>
> $EDITOR -t aio_read
>
> and you end up here:
>
> ssize_t (*aio_read) (struct kiocb *, char __user *, size_t, loff_t);
>
> which is uninformative. You have to go and hunt down an instance of an
> aio_read() implementation to actually be sure what those args are doing.
>
> So I think putting the nicely-chosen variable names in there is quite
> helpful.

done

> > +#ifdef CONFIG_STATISTICS
> > +
> > +extern int statistic_create(struct statistic_interface *, const char *);
> > +extern int statistic_remove(struct statistic_interface *);
> > +
> > +/**
> > + * statistic_add - update statistic with incremental data in (X, Y) pair
> > + * @stat: struct statistic array
> > + * @i: index of statistic to be updated
> > + * @value: X
> > + * @incr: Y
> > + *
> > + * The actual processing of the (X, Y) data pair is determined by the current
> > + * the definition applied to the statistic. See Documentation/statistics.txt.
> > + *
> > + * This variant takes care of protecting per-cpu data. It is preferred whenever
> > + * exploiters don't update several statistics of the same entity in one go.
> > + */
> > +static inline void statistic_add(struct statistic *stat, int i,
> > + s64 value, u64 incr)
> > +{
> > + unsigned long flags;
> > + local_irq_save(flags);
> > + if (stat[i].state == STATISTIC_STATE_ON)
> > + stat[i].add(&stat[i], smp_processor_id(), value, incr);
> > + local_irq_restore(flags);
> > +}
>
> afaict this isn't actually used?
>
> If it is, and assuming this is only accessed via a function pointer (the
> mysterious `add' method) then there's not a lot of point in inlining it.
>
> Except if this code really isn't called, then inlining it will avoid having
> an unused piece of code in vmlinux.
>
> But if it _is_ used, and it has multiple users then we end up with multiple
> copies in vmlinux.
>
> So what's up with that?

That's one of the few functions that exploiter's are supposed to call
for statistic updates. This inline function then calls the mysterious
add function of the data processing mode that is being used at that
time.

Just revert inlining?

> And elsewhere we have:
>
> > +static inline void statistic_add(struct statistic *stat, int i,
> > + s64 value, u64 incr)
> > +{
> > +}
> > +
>
> Do we expect this to have any callers if !CONFIG_STATISTICS?

This relieves exploiters from the burden of taking care of
!CONFIG_STATISTICS. I think something like this in ten thousand to-be
exploiters would look pretty ugly:

#ifdef CONFIG_STATISTICS
statistic_add(mystat, index, value, incr);
#endif

>
> > +static int statistic_free(struct statistic *stat, struct statistic_info *info)
> > +{
> > + int cpu;
> > + stat->state = STATISTIC_STATE_RELEASED;
> > + if (unlikely(info->flags & STATISTIC_FLAGS_NOINCR)) {
> > + statistic_free_ptr(stat, stat->pdata);
> > + stat->pdata = NULL;
> > + return 0;
> > + }
> > + for_each_cpu(cpu) {
>
> for_each_cpu() is on death row. Replace it with for_each_possible_cpu().
> If that is indeed appropriate - perhaps you meant online_cpu, or
> present_cpu.

done

> > +static void * statistic_alloc_generic(struct statistic *stat, size_t size,
> ^
>
> unwelcome space ;)

done

> > +static int statistic_alloc(struct statistic *stat,
> > + struct statistic_info *info)
> > +{
> > + int cpu;
> > + stat->age = sched_clock();
>
> argh. Didn't we end up finding a way to avoid this?
>
> At the least, we should have statistics_clock(), or nsec_clock(), or
> something which is decoupled from this low-level scheduler-internal thing,
> and which architectures can implement (vis attribute-weak) if they have a
> preferred/better/more-accurate alternative.

That's something to address next. Sorry, didn't manage to change it for
the recent update patch.

> > +static int statistic_transition(struct statistic *stat,
> > + struct statistic_info *info,
> > + enum statistic_state requested_state)
> > +{
> > + int z = (requested_state < stat->state ? 1 : 0);
> > + int retval = -EINVAL;
> > +
> > + while (stat->state != requested_state) {
> > + switch (stat->state) {
> > + case STATISTIC_STATE_INVALID:
> > + retval = ( z ? -EINVAL : statistic_initialise(stat) );
> > + break;
> > + case STATISTIC_STATE_UNCONFIGURED:
> > + retval = ( z ? statistic_uninitialise(stat)
> > + : statistic_define(stat) );
> > + break;
> > + case STATISTIC_STATE_RELEASED:
> > + retval = ( z ? statistic_initialise(stat)
> > + : statistic_alloc(stat, info) );
> > + break;
> > + case STATISTIC_STATE_OFF:
> > + retval = ( z ? statistic_free(stat, info)
> > + : statistic_start(stat) );
> > + break;
> > + case STATISTIC_STATE_ON:
> > + retval = ( z ? statistic_stop(stat) : -EINVAL );
> > + break;
>
> Lots of unneeded parentheses there.

done

> > +static int statistic_reset(struct statistic *stat, struct statistic_info *info)
> > +{
> > + enum statistic_state prev_state = stat->state;
> > + int cpu;
> > +
> > + if (unlikely(stat->state < STATISTIC_STATE_OFF))
> > + return 0;
> > + statistic_transition(stat, info, STATISTIC_STATE_OFF);
> > + if (unlikely(info->flags & STATISTIC_FLAGS_NOINCR))
> > + statistic_reset_ptr(stat, stat->pdata);
> > + else
> > + for_each_cpu(cpu)
>
> for_each_possible_cpu() (maybe)

correct, done

> > +static inline int statistic_fdata(struct statistic_interface *interface, int i,
> > + struct statistic_file_private *fpriv)
> > +{
> > + struct statistic *stat = &interface->stat[i];
> > + struct statistic_info *info = &interface->info[i];
> > + struct statistic_discipline *disc = &statistic_discs[stat->type];
> > + struct statistic_merge_private mpriv;
> > + int retval;
> > +
> > + if (unlikely(stat->state < STATISTIC_STATE_OFF))
> > + return 0;
> > + if (unlikely(info->flags & STATISTIC_FLAGS_NOINCR))
> > + return disc->fdata(stat, info->name, fpriv, stat->pdata);
> > + mpriv.dst = statistic_alloc_ptr(stat, GFP_KERNEL, -1);
> > + if (unlikely(!mpriv.dst))
> > + return -ENOMEM;
> > + spin_lock_init(&mpriv.lock);
> > + mpriv.stat = stat;
> > + on_each_cpu(statistic_merge, &mpriv, 0, 1);
> > + retval = disc->fdata(stat, info->name, fpriv, mpriv.dst);
> > + statistic_free_ptr(stat, mpriv.dst);
> > + return retval;
> > +}
>
> You do like that `inline' thingy ;)

changed

> > +/* cpu hotplug handling for per-cpu data */
> > +
> > +static inline int _statistic_hotcpu(struct statistic_interface *interface,
> > + int i, unsigned long action, int cpu)
> > +{
> > + struct statistic *stat = &interface->stat[i];
> > + struct statistic_info *info = &interface->info[i];
> > +
> > + if (unlikely(info->flags & STATISTIC_FLAGS_NOINCR))
> > + return 0;
> > + if (stat->state < STATISTIC_STATE_OFF)
> > + return 0;
> > + switch (action) {
> > + case CPU_UP_PREPARE:
> > + stat->pdata->ptrs[cpu] = statistic_alloc_ptr(stat, GFP_ATOMIC,
> > + cpu_to_node(cpu));
> > + break;
>
> So this allocation can fail. Does all the other code handle that? If not,
> we should fail the CPU bringup.

done

> Dangit, this is inlined as well. It makes oops-tracing really hard :(

changed

> > +{
> > + statistic_root_dir = debugfs_create_dir(STATISTIC_ROOT_DIR, NULL);
> > + if (unlikely(!statistic_root_dir))
> > + return -ENOMEM;
> > + INIT_LIST_HEAD(&statistic_list);
> > + mutex_init(&statistic_list_mutex);
> > + register_cpu_notifier(&statistic_hotcpu_notifier);
>
> Actually, this can fail too (well, actually it can't, but the API suggests
> it can).

I see. grep indicates that nobody cares. I guess if
register_cpu_notifier ever starts failing, the kernel will do lots of
funny things anyway. I tend to leave my code the way it is for the time
being and to leave this kernel-wide job to someone else.

> > + int offset;
> > + struct list_head line_lh;
> > + char *nl;
> > + size_t line_size = 0;
> > +
> > + INIT_LIST_HEAD(&line_lh);
>
> LIST_HEAD(line_lh);

done

> > +
> > +/* code concerned with utilisation indicator statistic */
> > +
> > +static void statistic_reset_util(struct statistic *stat, void *ptr)
> > +{
> > + struct statistic_entry_util *util = ptr;
> > + util->num = 0;
> > + util->acc = 0;
> > + util->min = (~0ULL >> 1) - 1;
> > + util->max = -(~0ULL >> 1) + 1;
> > +}
>
> `min' is a large positive number and `max' is a large negative one. Is that
> right?
>
> `min' gets 0x7ffffffffffffffe, which seems to be off-by-one.

fixed off-by-one

> Consider using LLONG_MAX and friends.

changed

> > +static int statistic_fdata_util(struct statistic *stat, const char *name,
> > + struct statistic_file_private *fpriv,
> > + void *data)
> > +{
> > + struct sgrb_seg *seg;
> > + struct statistic_entry_util *util = data;
> > + unsigned long long whole = 0;
> > + signed long long min = 0, max = 0, decimal = 0, last_digit;
> > +
> > + seg = sgrb_seg_find(&fpriv->read_seg_lh, 128);
> > + if (unlikely(!seg))
> > + return -ENOMEM;
> > + if (likely(util->num)) {
> > + whole = util->acc;
> > + do_div(whole, util->num);
> > + decimal = util->acc * 10000;
> > + do_div(decimal, util->num);
> > + decimal -= whole * 10000;
> > + if (decimal < 0)
> > + decimal = -decimal;
> > + last_digit = decimal;
> > + do_div(last_digit, 10);
> > + last_digit = decimal - last_digit * 10;
> > + if (last_digit >= 5)
> > + decimal += 10;
> > + do_div(decimal, 10);
> > + min = util->min;
> > + max = util->max;
> > + }
> > + seg->offset += sprintf(seg->address + seg->offset,
> > + "%s %Lu %Ld %Ld.%03lld %Ld\n", name,
> > + (unsigned long long)util->num,
> > + (signed long long)min, whole, decimal,
> > + (signed long long)max);
>
> There's no need to cast `min' and `max' here. A cast would be needed if
> they were u64/s64.

done

> > +
> > +static inline int statistic_add_sparse_new(struct statistic_sparse_list *slist,
> > + s64 value, u64 incr)
> > +{
> > + struct statistic_entry_sparse *entry;
> > +
> > + if (unlikely(slist->entries == slist->entries_max))
> > + return -ENOMEM;
> > + entry = kmalloc(sizeof(struct statistic_entry_sparse), GFP_ATOMIC);
> > + if (unlikely(!entry))
> > + return -ENOMEM;
> > + entry->value = value;
> > + entry->hits = incr;
> > + slist->entries++;
> > + list_add_tail(&entry->list, &slist->entry_lh);
> > + return 0;
> > +}
> >
> > +static inline void _statistic_add_sparse(struct statistic_sparse_list *slist,
> > + s64 value, u64 incr)
> > +{
> > + struct list_head *head = &slist->entry_lh;
> > + struct statistic_entry_sparse *entry;
> > +
> > + list_for_each_entry(entry, head, list) {
> > + if (likely(entry->value == value)) {
> > + entry->hits += incr;
> > + statistic_add_sparse_sort(head, entry);
> > + return;
> > + }
> > + }
> > + if (unlikely(statistic_add_sparse_new(slist, value, incr)))
> > + slist->hits_missed += incr;
> > +}
>
> I hereby revoke your inlining license.

I admit my failure. Changed almost all of these places.

> > +static void statistic_set_sparse(struct statistic *stat, s64 value, u64 total)
> > +{
> > + struct statistic_sparse_list *slist = (struct statistic_sparse_list *)
> > + stat->pdata;
>
> Hang on, what's happening here? statistic.pdata is `struct percpu_data *'.
> That's
>
> struct percpu_data {
> void *ptrs[NR_CPUS];
> };
>
> How can we cast that to a statistic_sparse_list* and then start playing
> with it? We're supposed to use per_cpu_ptr() to get at the actual data.

With regard to the data that a statistic feeds on, there are are two
types of statistics: statistics that accumulate incremental updates
(pushed - probably frequently - through statistic_add() or
statistic_inc()), and statistics that accept total numbers (pulled
through statistic_set() only when read by user). We use per-cpu data for
the former. As to the latter, per-cpu data would be way to heavy.
That is why, my code is capable of dealing with both per-cpu data and
non-per-cpu data. Since a particular statistic is either per-cpu or
non-per-cpu, I use the same data pointer for both cases in order to keep
struct statistic as small as possible.

I admit the cast looks a bit fishy. But lines above are correct.

I should probably rename 'pdata' in struct statistic to 'data' (to
reflect its versatile use), change its type from struct percpu_data* to
void*, and finally use per_cpu_ptr. per_cpu_ptr works fine any pointer
type. The only issue with per_cpu_ptr is that I can't convert

stat->pdata->ptrs[cpu] = some_buf;

to

per_cpu_ptr(stat->pdata, cpu) = some_buf;

as done in my code instead of calling alloc_percpu() in order to avoid
eating up memory for offline or unavailable cpus. See cpu hotplug
handler.

2006-05-30 01:12:48

by Andrew Morton

[permalink] [raw]
Subject: Re: [Patch 5/6] statistics infrastructure

On Tue, 30 May 2006 00:17:08 +0200
Martin Peschke <[email protected]> wrote:

> > > +static void statistic_set_sparse(struct statistic *stat, s64 value, u64 total)
> > > +{
> > > + struct statistic_sparse_list *slist = (struct statistic_sparse_list *)
> > > + stat->pdata;
> >
> > Hang on, what's happening here? statistic.pdata is `struct percpu_data *'.
> > That's
> >
> > struct percpu_data {
> > void *ptrs[NR_CPUS];
> > };
> >
> > How can we cast that to a statistic_sparse_list* and then start playing
> > with it? We're supposed to use per_cpu_ptr() to get at the actual data.
>
> With regard to the data that a statistic feeds on, there are are two
> types of statistics: statistics that accumulate incremental updates
> (pushed - probably frequently - through statistic_add() or
> statistic_inc()), and statistics that accept total numbers (pulled
> through statistic_set() only when read by user). We use per-cpu data for
> the former. As to the latter, per-cpu data would be way to heavy.
> That is why, my code is capable of dealing with both per-cpu data and
> non-per-cpu data. Since a particular statistic is either per-cpu or
> non-per-cpu, I use the same data pointer for both cases in order to keep
> struct statistic as small as possible.
>
> I admit the cast looks a bit fishy. But lines above are correct.

<head spins>

Perhaps a suitable comment somewhere so people don't fall out of their
chairs when they see this like I did?

2006-05-30 08:07:08

by Heiko Carstens

[permalink] [raw]
Subject: Re: [Patch] statistics infrastructure - update 1

> case CPU_UP_PREPARE:
> stat->pdata->ptrs[cpu] = statistic_alloc_ptr(stat, GFP_ATOMIC,

Why not GFP_KERNEL?

> + if (!stat->pdata->ptrs[cpu])
> + return -ENOMEM;

NOTIFY_BAD instead of -ENOMEM, I guess.

> break;
> case CPU_UP_CANCELED:
> case CPU_DEAD:

I think your merge code (which gets called if CPU_UP_PREPARE fails) expects
stat->pdata->ptrs[cpu] to be non-zero, right?

2006-05-30 11:22:48

by Martin Peschke

[permalink] [raw]
Subject: Re: [Patch] statistics infrastructure - update 1

Heiko Carstens wrote:
>> case CPU_UP_PREPARE:
>> stat->pdata->ptrs[cpu] = statistic_alloc_ptr(stat, GFP_ATOMIC,
>
> Why not GFP_KERNEL?

I see. Schedule() is permitted in this context. Will change it.

>> + if (!stat->pdata->ptrs[cpu])
>> + return -ENOMEM;
>
> NOTIFY_BAD instead of -ENOMEM, I guess.

Not a bug, but slightly confusing. I think I will clean it up in my
next update patch.

>> break;
>> case CPU_UP_CANCELED:
>> case CPU_DEAD:
>
> I think your merge code (which gets called if CPU_UP_PREPARE fails) expects
> stat->pdata->ptrs[cpu] to be non-zero, right?

That's a bug and needs fixing (just bail out if pointer if zero).

Thanks, Martin

2006-05-30 11:36:03

by Martin Peschke

[permalink] [raw]
Subject: Re: [Patch 5/6] statistics infrastructure

Nikita Danilov wrote:
> Martin Peschke writes:
> > This patch adds statistics infrastructure as common code.
> >
>
> [...]
>
> > +
> > +static void statistic_add_util(struct statistic *stat, int cpu,
> > + s64 value, u64 incr)
> > +{
> > + struct statistic_entry_util *util = stat->pdata->ptrs[cpu];
> > + util->num += incr;
> > + util->acc += value * incr;
> > + if (unlikely(value < util->min))
> > + util->min = value;
> > + if (unlikely(value > util->max))
> > + util->max = value;
>
> One useful aggregate that can be calculated here is a standard
> deviation. Something like
>
> util->acc_sq += value * incr * value * incr; /* sum of squares */
>
> And in statistic_fdata_util() squared standard deviation is
>
> std_dev = util->acc_sq;
> /*
> * Difference of averaged square and squared average.
> */
> std_dev = do_div(std_dev, util->num) - whole * whole;
>
> > +}
>
> Nikita.

Excellent idea. I will add the standard deviation.

Thanks, Martin

2006-05-30 17:17:42

by Martin Peschke

[permalink] [raw]
Subject: Re: [Patch 5/6] statistics infrastructure

Andrew Morton wrote:
> Martin Peschke <[email protected]> wrote:
>> +static int statistic_alloc(struct statistic *stat,
>> + struct statistic_info *info)
>> +{
>> + int cpu;
>> + stat->age = sched_clock();
>
> argh. Didn't we end up finding a way to avoid this?
>
> At the least, we should have statistics_clock(), or nsec_clock(), or
> something which is decoupled from this low-level scheduler-internal thing,
> and which architectures can implement (vis attribute-weak) if they have a
> preferred/better/more-accurate alternative.

I use clocks for two purposes. Both have used sched_clock() so far.

The statistics infrastructure itself uses a clock only for time stamps
that tell users what time a statistic has been switched on/off and reset.
This is what you have spotted here.

(The other and more important requirement regards exploiters of the
statistics infrastructure. They need a clock to measure latencies,
which they can report then.)

Regarding those time stamps, I think it best to make them look like other
timestamps, specifically the printk() time stamps in order not to confuse
users. That is why, one of my patches introduces nsec_to_timestamp()
based on some lines from printk(). Printk() uses printk_clock() as
source, which is nothing else than a sched_clock() call, unless
reimpelmented by architectures (only done for ia64).
If I want similar timestamps, I need the same time source too.

Now my question:

Would I get away with making printk_clock() a timestamp_clock() that
should be used by anyone exporting nsec_to_timestamp()-formated time
stamps to user space, including me?

I would then continue to see the use of sched_clock() in printk_clock()
... aehm timestamp_clock() as somebody else's problem (or at least
as a subordinate problem).
Thoughts? <ducking down>

Martin

2006-05-30 19:15:25

by Andrew Morton

[permalink] [raw]
Subject: Re: [Patch 5/6] statistics infrastructure

On Tue, 30 May 2006 19:17:19 +0200
Martin Peschke <[email protected]> wrote:

> Would I get away with making printk_clock() a timestamp_clock() that
> should be used by anyone exporting nsec_to_timestamp()-formated time
> stamps to user space, including me?
>
> I would then continue to see the use of sched_clock() in printk_clock()
> ... aehm timestamp_clock() as somebody else's problem (or at least
> as a subordinate problem).

Sure, a generic kernel-wide nsec-resolution timestamp_clock() makes sense
to me.

The default implementation can use sched_clock() but arch maintainers
can/should override it (vai attribute-weak) and do somethnig better.