2006-05-19 16:08:01

by Martin Peschke

[permalink] [raw]
Subject: [Patch 0/6] statistics infrastructure

Andrew, please apply.

Changes since I have posted these patches last time:

- improvements as suggested on lkml
(documentation, comments, coding style, etc.)

- fixed race in statistic_add()/statistic_inc()
with regard to releasing statistics



My patch series is a proposal for a generic implementation of statistics.
Envisioned exploiters include device drivers, and any other component.
It provides both a unified programming interface for exploiters as well
as a unified user interface. It comes with a set of disciplines that
implement various ways of data processing, like counters and histograms.

The recent rework addresses performance issues and memory footprint,
straightens some concepts out, streamlines the programming interface,
removes some weiredness from the user interface, reduces the amount of
code, and moves the exploitation according to last time's feedback.

A few more keywords for the reader's convenience:
based on per-cpu data; spinlock-free protection of data; observes
cpu-hot(un)plug for efficient memory use; tiny state machine for
switching-on, switching-off, releasing data etc.; configurable by users
at run-time; still sitting in debugfs; simple addition of other disciplines.

Good places to start reading code are:

statistic_create(), statistic_remove()
statistic_add(), statistic_inc()
struct statistic_interface, struct statistic
struct statistic_discipline, statistic_*_counter()
statistic_transition()

Martin




2006-05-19 16:24:24

by Andrew Morton

[permalink] [raw]
Subject: Re: [Patch 0/6] statistics infrastructure

Martin Peschke <[email protected]> wrote:
>
> My patch series is a proposal for a generic implementation of statistics.

This uses debugfs for the user interface, but the
per-task-delay-accounting-*.patch series from Balbir creates an extensible
netlink-based system for passing instrumentation results back to userspace.

Can this code be converted to use those netlink interfaces, or is Balbir's
approach unsuitable, or hasn't it even been considered, or what?

2006-05-19 19:02:32

by Balbir Singh

[permalink] [raw]
Subject: Re: [Patch 0/6] statistics infrastructure

On 5/20/06, Balbir Singh <[email protected]> wrote:
>
> On 5/19/06, Andrew Morton <[email protected]> wrote:
>
> > Martin Peschke <[email protected]> wrote:
> > >
> > > My patch series is a proposal for a generic implementation of statistics.
> >
> > This uses debugfs for the user interface, but the
> > per-task-delay-accounting-*.patch series from Balbir creates an extensible
> > netlink-based system for passing instrumentation results back to userspace.
> >
> > Can this code be converted to use those netlink interfaces, or is Balbir's
> > approach unsuitable, or hasn't it even been considered, or what?
> >
> >
>
Hi, Martin/Andrew,

I am resending this email, my mailer got crazy and sent out HTML (sorry!)

I have seen the patches around, but I've had no time to review them. I
was planning to do so this weekend.

The main difference I see, like you pointed out is the netlink
interface vs debugfs. I think the netlink approach is more suitable
(there is no need to mount a filesystem and create files followed by
frequent open/read/close operations). Just one netlink socket should
do the trick. The event subscription mechanism in netlink is very
useful as well.

Martin, could you please take a look at the taskstats interface and
see if it is possible to make use of them?

Thanks,
Balbir

2006-05-19 23:03:56

by Martin Peschke

[permalink] [raw]
Subject: Re: [Patch 0/6] statistics infrastructure

Andrew Morton wrote:
> Martin Peschke <[email protected]> wrote:
>> My patch series is a proposal for a generic implementation of statistics.
>
> This uses debugfs for the user interface, but the
> per-task-delay-accounting-*.patch series from Balbir creates an extensible
> netlink-based system for passing instrumentation results back to userspace.
>
> Can this code be converted to use those netlink interfaces, or is Balbir's
> approach unsuitable, or hasn't it even been considered, or what?

Andrew, Balbir,

I will read Balbir's patches. Probably, I won't manage it this weekend,
as a friend of mine is visiting.

Why doesn't come it as a surprise that the user interface appears to
restart the discussion ;-)
I can't comment on netlink yet. There are some thoughts on why I
chose debugfs in my documentation file.

Balbir, could you try to summarise briefly what the main issues are that
your patches solve?

To summarise the issues I want to solve with my paches:

First, we have a requirement to provide statistics for our FCP attachment
(transport latencies, utilisation of likely bottlenecks, etc.),
mostly for customer service reasons. This is what the small exploitation
patches are about.

Second, I thought it useful to get there by implementing and using a generic
statistics infrastructure that could be called by other kernel components.
This is what the bulk of my patches and all of the documentation is about.
Debugfs is just one aspect of it (- it shouldn't be too difficult to rip
it out and use some other transport). But, there are other features like
the various modes for accumulating data, and that the on-the-fly data
processing is configurable by users to a certain degree.

Martin


2006-05-21 11:33:27

by Balbir Singh

[permalink] [raw]
Subject: Re: [Patch 0/6] statistics infrastructure

> Andrew, Balbir,
>
> I will read Balbir's patches. Probably, I won't manage it this weekend,
> as a friend of mine is visiting.
>
> Why doesn't come it as a surprise that the user interface appears to
> restart the discussion ;-)
> I can't comment on netlink yet. There are some thoughts on why I
> chose debugfs in my documentation file.
>
> Balbir, could you try to summarise briefly what the main issues are that
> your patches solve?
>

We collect statistics about the delays that are experienced by each task on
the system. Note, that this information is per-task.

The information collected provides us with information about the number of
times the the task executed on the runqueue, the delay it encountered
waiting for CPU (run_delay) and the total time it spent on the runqueue.
Similar statistics are collected for block io and swapin block io.

The statistics can be queried at any time (during the lifetime of the task)
and user space can be notified of the statistics when the task exits.

More detailed information can be found at
http://lkml.org/lkml/2006/5/2/30

and in the Documentation/accounting directory tree in -mm

I hope this is the summary you were looking for.

Warm Regards,
Balbir Singh,
Linux Technology Center,
IBM Software Labs

2006-05-22 18:09:36

by Tim Bird

[permalink] [raw]
Subject: netlink vs. debugfs (was Re: [Patch 0/6] statistics infrastructure)

Andrew Morton wrote:
> Martin Peschke <[email protected]> wrote:
>> My patch series is a proposal for a generic implementation of statistics.
>
> This uses debugfs for the user interface, but the
> per-task-delay-accounting-*.patch series from Balbir creates an extensible
> netlink-based system for passing instrumentation results back to userspace.
>
> Can this code be converted to use those netlink interfaces, or is Balbir's
> approach unsuitable, or hasn't it even been considered, or what?

Can someone give me the 20-second elevator pitch on why
netlink is preferred over debugfs? I've heard of a
number of debugfs/procfs users requested to switch over.

Thanks,
-- Tim

=============================
Tim Bird
Architecture Group Chair, CE Linux Forum
Senior Staff Engineer, Sony Electronics
=============================

2006-05-22 18:38:34

by Balbir Singh

[permalink] [raw]
Subject: Re: netlink vs. debugfs (was Re: [Patch 0/6] statistics infrastructure)

On Mon, May 22, 2006 at 11:09:22AM -0700, Tim Bird wrote:
> Andrew Morton wrote:
> > Martin Peschke <[email protected]> wrote:
> >> My patch series is a proposal for a generic implementation of statistics.
> >
> > This uses debugfs for the user interface, but the
> > per-task-delay-accounting-*.patch series from Balbir creates an extensible
> > netlink-based system for passing instrumentation results back to userspace.
> >
> > Can this code be converted to use those netlink interfaces, or is Balbir's
> > approach unsuitable, or hasn't it even been considered, or what?
>
> Can someone give me the 20-second elevator pitch on why
> netlink is preferred over debugfs? I've heard of a
> number of debugfs/procfs users requested to switch over.
>
> Thanks,
> -- Tim
>
> =============================
> Tim Bird
> Architecture Group Chair, CE Linux Forum
> Senior Staff Engineer, Sony Electronics
> =============================

Hi, Tim,

I am no debugfs expert, I hope I can do justice to the comparison.

Debugfs Netlink/Genetlink

1. Filesystem based - requires creating Several types of data can
files for each type of data passed be multiplexed over one netlink
down socket.
2. Hard to determine record format/data Contains metadata including
type of data and length
with each record
3. Notifications are hard Notifications are very easy
I think they can be done using inotify good library support for
notifications. Data can
either be broadcast or
selectively mulitcast
4. Requires several open/read/write/close A single socket can be
operations opened, data from kernel
space can be multiplexed
over it.

I don't think I did any justice to the advantages of debugfs. The only
one I can think of is that it uses relayfs. Relayfs is efficient in the
sense that it uses per-cpu buffers.

Anybody else want to take a shot in comparing the two?

Balbir Singh,
Linux Technology Center,
IBM Software Labs

2006-05-22 18:53:40

by Evgeniy Polyakov

[permalink] [raw]
Subject: Re: netlink vs. debugfs (was Re: [Patch 0/6] statistics infrastructure)

On Tue, May 23, 2006 at 12:04:00AM +0530, Balbir Singh ([email protected]) wrote:
> Anybody else want to take a shot in comparing the two?

Netlink is always presented in the kernel, so no need to make
additional dependencies for special FS.
But number of netlink sockets is not that big, so use new one if you
create really generic mechanism, or consider using connector/gennetlink.

--
Evgeniy Polyakov

2006-05-23 16:59:47

by Martin Peschke

[permalink] [raw]
Subject: Re: [Patch 0/6] statistics infrastructure

Andrew Morton wrote:
> Martin Peschke <[email protected]> wrote:
>> My patch series is a proposal for a generic implementation of statistics.
>
> This uses debugfs for the user interface, but the
> per-task-delay-accounting-*.patch series from Balbir creates an extensible
> netlink-based system for passing instrumentation results back to userspace.
>
> Can this code be converted to use those netlink interfaces, or is Balbir's
> approach unsuitable, or hasn't it even been considered, or what?
>

Andrew,

taskstats, Balbir'r approach, is too specific and doesn't work for me.
It is by design limited to per-task data.

My statistics code is not limited to per-task statistics, but allows exploiters
to have data been accumulated and been shown for whatever entity they need to,
may it be for tasks, for SCSI disks, per adapter, per queue, per interface,
for a device driver, etc.

If you want me to change my code to use netlink anyway, I might be able to
implement my own genetlink family. I haven't look at the details of that yet.

Martin



2006-05-23 20:44:28

by Al Boldi

[permalink] [raw]
Subject: Re: [Patch 0/6] statistics infrastructure

Martin Peschke wrote:
> Andrew Morton wrote:
> > Martin Peschke <[email protected]> wrote:
> >> My patch series is a proposal for a generic implementation of
> >> statistics.
> >
> > This uses debugfs for the user interface, but the
> > per-task-delay-accounting-*.patch series from Balbir creates an
> > extensible netlink-based system for passing instrumentation results back
> > to userspace.
> >
> > Can this code be converted to use those netlink interfaces, or is
> > Balbir's approach unsuitable, or hasn't it even been considered, or
> > what?
>
> Andrew,
>
> taskstats, Balbir'r approach, is too specific and doesn't work for me.
> It is by design limited to per-task data.
>
> My statistics code is not limited to per-task statistics, but allows
> exploiters to have data been accumulated and been shown for whatever
> entity they need to, may it be for tasks, for SCSI disks, per adapter, per
> queue, per interface, for a device driver, etc.

How does your work and Balbir's and CKRM relate to each other?

Is there not a way to abstract your works to provide a common statistics
infrastructure for all?

Thanks!

--
Al

2006-05-23 21:40:18

by Andrew Morton

[permalink] [raw]
Subject: Re: [Patch 0/6] statistics infrastructure

Martin Peschke <[email protected]> wrote:
>
> Andrew Morton wrote:
> > Martin Peschke <[email protected]> wrote:
> >> My patch series is a proposal for a generic implementation of statistics.
> >
> > This uses debugfs for the user interface, but the
> > per-task-delay-accounting-*.patch series from Balbir creates an extensible
> > netlink-based system for passing instrumentation results back to userspace.
> >
> > Can this code be converted to use those netlink interfaces, or is Balbir's
> > approach unsuitable, or hasn't it even been considered, or what?
> >
>
> Andrew,
>
> taskstats, Balbir'r approach, is too specific and doesn't work for me.
> It is by design limited to per-task data.

OK. They are pretty different things.

Balbir, do you see any sane way in which the APIs you've implemented can be
extended to cover this requirement?

> My statistics code is not limited to per-task statistics, but allows exploiters
> to have data been accumulated and been shown for whatever entity they need to,
> may it be for tasks, for SCSI disks, per adapter, per queue, per interface,
> for a device driver, etc.

OK.

> If you want me to change my code to use netlink anyway, I might be able to
> implement my own genetlink family. I haven't look at the details of that yet.
>

Well, a debugfs interface _should_ be OK. If not, why do we need debugfs?

Ho hum, hard. Please send the patches again, let's take a closer look, see
if we can move them forward a bit.

2006-05-24 03:12:12

by Balbir Singh

[permalink] [raw]
Subject: Re: [Patch 0/6] statistics infrastructure

<snip>
> > Andrew,
> >
> > taskstats, Balbir'r approach, is too specific and doesn't work for me.
> > It is by design limited to per-task data.
>
> OK. They are pretty different things.
>
> Balbir, do you see any sane way in which the APIs you've implemented can be
> extended to cover this requirement?

I'll work with Martin on that. If Martin decides to move to
netlink/genetlink we could search for some common ground w.r.t to
transfering data to user space, but IMHO its going to be hard, our API
is meant to be used in task context. I think both the statistics
target different use cases (one is device driver oriented and the
other is task oriented)

>
> > My statistics code is not limited to per-task statistics, but allows exploiters
> > to have data been accumulated and been shown for whatever entity they need to,
> > may it be for tasks, for SCSI disks, per adapter, per queue, per interface,
> > for a device driver, etc.
>
> OK.
>
> > If you want me to change my code to use netlink anyway, I might be able to
> > implement my own genetlink family. I haven't look at the details of that yet.
> >
>
> Well, a debugfs interface _should_ be OK. If not, why do we need debugfs?
>
> Ho hum, hard. Please send the patches again, let's take a closer look, see
> if we can move them forward a bit.
>

Warm Regards,
Balbir
Linux Technology Center,
India Software Labs,
Bangalore