2011-02-07 22:31:00

by Chris Friesen

[permalink] [raw]
Subject: RFC: /proc/<pid>/sched should contain cumulative data for all threads in process


Hi,

We've got a tool that gathers lots of scheduling data for each process
(not task/thread) on the system.

For /proc/<pid>/{stat,io} this is straightforward, as the per-thread
values are summed together for the process as a whole.

However, /proc/<pid>/sched only shows the data for the individual thread
with the same tid as the pid. To get a per-process view we need to
manually scan all the threads and sum them--and this can get expensive
due to all the extra file operations, parsing, etc.

Was this a concious design decision, or just an oversight? Would a
patch converting it to whole-process values be accepted or is it enough
of a standard interface that we can't break existing apps that expect
the current behaviour?

Thanks,
Chris

--
Chris Friesen
Software Developer
GENBAND
[email protected]
http://www.genband.com


2011-02-08 09:20:06

by Peter Zijlstra

[permalink] [raw]
Subject: Re: RFC: /proc/<pid>/sched should contain cumulative data for all threads in process

On Mon, 2011-02-07 at 16:29 -0600, Chris Friesen wrote:
> Hi,
>
> We've got a tool that gathers lots of scheduling data for each process
> (not task/thread) on the system.
>
> For /proc/<pid>/{stat,io} this is straightforward, as the per-thread
> values are summed together for the process as a whole.
>
> However, /proc/<pid>/sched only shows the data for the individual thread
> with the same tid as the pid. To get a per-process view we need to
> manually scan all the threads and sum them--and this can get expensive
> due to all the extra file operations, parsing, etc.
>
> Was this a concious design decision, or just an oversight? Would a
> patch converting it to whole-process values be accepted or is it enough
> of a standard interface that we can't break existing apps that expect
> the current behaviour?

I'd as soon remove all that stuff than extend it, its an abi liability,
esp since you're talking about tools parsing this stuff.

2011-02-08 12:11:57

by Ingo Molnar

[permalink] [raw]
Subject: Re: RFC: /proc/<pid>/sched should contain cumulative data for all threads in process


* Peter Zijlstra <[email protected]> wrote:

> On Mon, 2011-02-07 at 16:29 -0600, Chris Friesen wrote:
> > Hi,
> >
> > We've got a tool that gathers lots of scheduling data for each process
> > (not task/thread) on the system.
> >
> > For /proc/<pid>/{stat,io} this is straightforward, as the per-thread
> > values are summed together for the process as a whole.
> >
> > However, /proc/<pid>/sched only shows the data for the individual thread
> > with the same tid as the pid. To get a per-process view we need to
> > manually scan all the threads and sum them--and this can get expensive
> > due to all the extra file operations, parsing, etc.
> >
> > Was this a concious design decision, or just an oversight? Would a
> > patch converting it to whole-process values be accepted or is it enough
> > of a standard interface that we can't break existing apps that expect
> > the current behaviour?
>
> I'd as soon remove all that stuff than extend it, its an abi liability,
> esp since you're talking about tools parsing this stuff.

So assuming a tool would want to capture such stats of the system, what would be its
options? Could we do all this via system-wide counters and perf stat alike cheap,
transparent gathering without having to patch/rebuild the kernel?

Thanks,

Ingo

2011-02-08 13:44:39

by Peter Zijlstra

[permalink] [raw]
Subject: Re: RFC: /proc/<pid>/sched should contain cumulative data for all threads in process

On Tue, 2011-02-08 at 13:11 +0100, Ingo Molnar wrote:
> * Peter Zijlstra <[email protected]> wrote:
>
> > On Mon, 2011-02-07 at 16:29 -0600, Chris Friesen wrote:
> > > Hi,
> > >
> > > We've got a tool that gathers lots of scheduling data for each process
> > > (not task/thread) on the system.
> > >
> > > For /proc/<pid>/{stat,io} this is straightforward, as the per-thread
> > > values are summed together for the process as a whole.
> > >
> > > However, /proc/<pid>/sched only shows the data for the individual thread
> > > with the same tid as the pid. To get a per-process view we need to
> > > manually scan all the threads and sum them--and this can get expensive
> > > due to all the extra file operations, parsing, etc.
> > >
> > > Was this a concious design decision, or just an oversight? Would a
> > > patch converting it to whole-process values be accepted or is it enough
> > > of a standard interface that we can't break existing apps that expect
> > > the current behaviour?
> >
> > I'd as soon remove all that stuff than extend it, its an abi liability,
> > esp since you're talking about tools parsing this stuff.
>
> So assuming a tool would want to capture such stats of the system, what would be its
> options? Could we do all this via system-wide counters and perf stat alike cheap,
> transparent gathering without having to patch/rebuild the kernel?

Very much depends on what is wanted, but most of the stuff inside those
files is very specific to the implementation and pinning any of that to
an ABI is like silly.

2011-02-08 15:14:56

by Ingo Molnar

[permalink] [raw]
Subject: Re: RFC: /proc/<pid>/sched should contain cumulative data for all threads in process


* Peter Zijlstra <[email protected]> wrote:

> Very much depends on what is wanted, but most of the stuff inside those files is
> very specific to the implementation and pinning any of that to an ABI is like
> silly.

Why? If the implementation changes then those values lose meaning and the most
correct value to report is *zero* and that's it.

That's how such things were always done, it was never a problem in the past 15
years. Why should it start being a problem suddenly?

Thanks,

Ingo

2011-02-09 17:28:58

by Chris Friesen

[permalink] [raw]
Subject: Re: RFC: /proc/<pid>/sched should contain cumulative data for all threads in process

On 02/08/2011 07:45 AM, Peter Zijlstra wrote:
> On Tue, 2011-02-08 at 13:11 +0100, Ingo Molnar wrote:

>> So assuming a tool would want to capture such stats of the system, what would be its
>> options? Could we do all this via system-wide counters and perf stat alike cheap,
>> transparent gathering without having to patch/rebuild the kernel?
>
> Very much depends on what is wanted, but most of the stuff inside those
> files is very specific to the implementation and pinning any of that to
> an ABI is like silly.

Currently we're using the following fields from
/proc/<pid>/task/<tid>/sched:

sum_exec_runtime
wait_sum
wait_max
exec_max
iowait_sum
iowait_count
nr_switches

If there's a better way to get this information (with the precision
available from this interface) then I'd love to hear about it.

Chris

--
Chris Friesen
Software Developer
GENBAND
[email protected]
http://www.genband.com