2006-09-22 19:18:06

by Roland

[permalink] [raw]
Subject: I/O statistics per process

Hello list,

it`s great that linux now has i/o re-nice with cfq now, but how can the
admin determine HOW MUCH i/o a process is actually generating ?

have seen this on windows (process explorer from sysinternals) and on
solaris: (psio http://users.tpg.com.au/bdgcvb/psio.html and pio
http://www.stormloader.com/yonghuang/freeware/pio.html), but what`s the
Linux (commandline) equivalent ?

is there a modified top/ps with i/o column, or is there yet missing
something at the kernel level for getting that counters from ?


regards
Roland K.
systems engineer.


2006-09-24 03:04:06

by Wu Fengguang

[permalink] [raw]
Subject: Re: I/O statistics per process

On Fri, Sep 22, 2006 at 09:12:05PM +0200, roland wrote:
> is there a modified top/ps with i/o column, or is there yet missing
> something at the kernel level for getting that counters from ?

Red Flag(http://www.redflag-linux.com/eindex.html) has developed an
iotop based on kprobes/systemtap. You can contact them if necessary.

2006-09-27 21:21:43

by Roland

[permalink] [raw]
Subject: Re: I/O statistics per process

thanks. tried to contact redflag, but they don`t answer. maybe support is
being on holiday.... !?

linux kernel hackers - there is really no standard way to watch i/o metrics
(bytes read/written) at process level?

it`s extremly hard for the admin to track down, what process is hogging the
disk - especially if there is more than one task consuming cpu.

meanwhile i found blktrace and read into the documenation. looks really cool
and seems to be very powerful tool - but it it`s seems a little bit
"oversized" and not the right tool for this. seems to be for
tracing/debugging/analysis

what about http://lkml.org/lkml/2005/9/12/89 "with following patch,
userspace processes/utilities will be able to access per process I/O
statistics. for example, top like utilites can use this information" which
has been posted to lkml one year ago ? any update on this ?

roland


----- Original Message -----
From: "Fengguang Wu" <[email protected]>
To: "roland" <[email protected]>
Cc: <[email protected]>
Sent: Sunday, September 24, 2006 5:04 AM
Subject: Re: I/O statistics per process


> On Fri, Sep 22, 2006 at 09:12:05PM +0200, roland wrote:
>> is there a modified top/ps with i/o column, or is there yet missing
>> something at the kernel level for getting that counters from ?
>
> Red Flag(http://www.redflag-linux.com/eindex.html) has developed an
> iotop based on kprobes/systemtap. You can contact them if necessary.

2006-09-27 22:55:57

by Andrew Morton

[permalink] [raw]
Subject: Re: I/O statistics per process

On Wed, 27 Sep 2006 23:22:02 +0200
"roland" <[email protected]> wrote:

> thanks. tried to contact redflag, but they don`t answer. maybe support is
> being on holiday.... !?
>
> linux kernel hackers - there is really no standard way to watch i/o metrics
> (bytes read/written) at process level?

The patch csa-accounting-taskstats-update.patch in current -mm kernels
(whcih is planned for 2.6.19) does have per-process chars-read and
chars-written accounting ("Extended accounting fields"). That's probably
not waht you really want, although it might tell you what you want to know.

> it`s extremly hard for the admin to track down, what process is hogging the
> disk - especially if there is more than one task consuming cpu.

Sure. Doing this is actually fairly tricky because disk writes are almost
always deferred. We'd need to remember which process dirtied some memory,
then track that info all the way down to the disk IO level, then perform
the accounting operations at IO submit-time or completion time, on a
per-page basis. It isn't rocket-science, but it's a lot of stuff and some
overhead.

> meanwhile i found blktrace and read into the documenation. looks really cool
> and seems to be very powerful tool - but it it`s seems a little bit
> "oversized" and not the right tool for this. seems to be for
> tracing/debugging/analysis
>
> what about http://lkml.org/lkml/2005/9/12/89 "with following patch,
> userspace processes/utilities will be able to access per process I/O
> statistics. for example, top like utilites can use this information" which
> has been posted to lkml one year ago ? any update on this ?

csa-accounting-taskstats-update.patch makes that information available to
userspace.

But it's approximate, because

- it doesn't account for disk readahead

- it doesn't account for pagefault-initiated reads (althought it easily
could - Jay?)

- it overaccounts for a process writing to an already-dirty page.

(We could fix this too: nuke the existing stuff and do

current->wchar += PAGE_CACHE_SIZE;

in __set_page_dirty_[no]buffers().) (But that ends up being wrong if
someone truncates the file before it got written)

- it doesn't account for file readahead (although it easily could)

- it doesn't account for pagefault-initiated readahead (it could)


hm. There's actually quite a lot we could do here to make these fields
more accurate and useful. A lot of this depends on what the definition of
these fields _is_. Is is just for disk IO? Is it supposed to include
console IO, or what?

2006-09-28 18:57:26

by Jay Lan

[permalink] [raw]
Subject: Re: I/O statistics per process

Andrew Morton wrote:
> On Wed, 27 Sep 2006 23:22:02 +0200
> "roland" <[email protected]> wrote:
>
>>thanks. tried to contact redflag, but they don`t answer. maybe support is
>>being on holiday.... !?
>>
>>linux kernel hackers - there is really no standard way to watch i/o metrics
>>(bytes read/written) at process level?
>
> The patch csa-accounting-taskstats-update.patch in current -mm kernels
> (whcih is planned for 2.6.19) does have per-process chars-read and
> chars-written accounting ("Extended accounting fields"). That's probably
> not waht you really want, although it might tell you what you want to know.
>
>>it`s extremly hard for the admin to track down, what process is hogging the
>>disk - especially if there is more than one task consuming cpu.

Rolend,

The per-process chars-read and chars-writeen accounting is made
available through taskstats interface (see Documentation/accounting/
taskstats.txt) in 2.6.18-mm1 kernel. Unfortunately, the user-space CSA
package is still a few months away. You may, for now, write your
own taskstats application or go a long way to port the in-kernel
implementation of pagg/job/csa.

However, the "Externded acocunting fields" patch does not provide you
straight forward answer. The patch provides accounting data only at
process termination (just like the BSD accounting) and it seems that
you want to see which run-away application (ie, alive) eating up your
disk. The taskstats interface offers a query mode (command-response),
but currently only delayacct uses that mode. We would need to make
those data available in the query mode in order for application to
see accounting data of live processes.

Of course, you can always kill all possible offenders.:) Then, you
can find out which one was the bad guy with the data.


Thanks,
- jay


>
> Sure. Doing this is actually fairly tricky because disk writes are almost
> always deferred. We'd need to remember which process dirtied some memory,
> then track that info all the way down to the disk IO level, then perform
> the accounting operations at IO submit-time or completion time, on a
> per-page basis. It isn't rocket-science, but it's a lot of stuff and some
> overhead.
>
>>meanwhile i found blktrace and read into the documenation. looks really cool
>>and seems to be very powerful tool - but it it`s seems a little bit
>>"oversized" and not the right tool for this. seems to be for
>>tracing/debugging/analysis
>>
>>what about http://lkml.org/lkml/2005/9/12/89 "with following patch,
>>userspace processes/utilities will be able to access per process I/O
>>statistics. for example, top like utilites can use this information" which
>>has been posted to lkml one year ago ? any update on this ?
>
> csa-accounting-taskstats-update.patch makes that information available to
> userspace.
>
> But it's approximate, because
>
> - it doesn't account for disk readahead
>
> - it doesn't account for pagefault-initiated reads (althought it easily
> could - Jay?)
>
> - it overaccounts for a process writing to an already-dirty page.
>
> (We could fix this too: nuke the existing stuff and do
>
> current->wchar += PAGE_CACHE_SIZE;
>
> in __set_page_dirty_[no]buffers().) (But that ends up being wrong if
> someone truncates the file before it got written)
>
> - it doesn't account for file readahead (although it easily could)
>
> - it doesn't account for pagefault-initiated readahead (it could)
>
>
> hm. There's actually quite a lot we could do here to make these fields
> more accurate and useful. A lot of this depends on what the definition of
> these fields _is_. Is is just for disk IO? Is it supposed to include
> console IO, or what?


2006-09-28 19:10:13

by Andrew Morton

[permalink] [raw]
Subject: Re: I/O statistics per process

On Thu, 28 Sep 2006 11:55:38 -0700
Jay Lan <[email protected]> wrote:

> Andrew Morton wrote:
> > On Wed, 27 Sep 2006 23:22:02 +0200
> > "roland" <[email protected]> wrote:
> >
> >>thanks. tried to contact redflag, but they don`t answer. maybe support is
> >>being on holiday.... !?
> >>
> >>linux kernel hackers - there is really no standard way to watch i/o metrics
> >>(bytes read/written) at process level?
> >
> > The patch csa-accounting-taskstats-update.patch in current -mm kernels
> > (whcih is planned for 2.6.19) does have per-process chars-read and
> > chars-written accounting ("Extended accounting fields"). That's probably
> > not waht you really want, although it might tell you what you want to know.
> >
> >>it`s extremly hard for the admin to track down, what process is hogging the
> >>disk - especially if there is more than one task consuming cpu.
>
> Rolend,
>
> The per-process chars-read and chars-writeen accounting is made
> available through taskstats interface (see Documentation/accounting/
> taskstats.txt) in 2.6.18-mm1 kernel. Unfortunately, the user-space CSA
> package is still a few months away. You may, for now, write your
> own taskstats application or go a long way to port the in-kernel
> implementation of pagg/job/csa.
>
> However, the "Externded acocunting fields" patch does not provide you
> straight forward answer. The patch provides accounting data only at
> process termination (just like the BSD accounting) and it seems that
> you want to see which run-away application (ie, alive) eating up your
> disk. The taskstats interface offers a query mode (command-response),
> but currently only delayacct uses that mode. We would need to make
> those data available in the query mode in order for application to
> see accounting data of live processes.

ow. That is a rather important enhancement to have.

> >
> > csa-accounting-taskstats-update.patch makes that information available to
> > userspace.
> >
> > But it's approximate, because
> >
> > - it doesn't account for disk readahead
> >
> > - it doesn't account for pagefault-initiated reads (althought it easily
> > could - Jay?)
> >
> > - it overaccounts for a process writing to an already-dirty page.
> >
> > (We could fix this too: nuke the existing stuff and do
> >
> > current->wchar += PAGE_CACHE_SIZE;
> >
> > in __set_page_dirty_[no]buffers().) (But that ends up being wrong if
> > someone truncates the file before it got written)
> >
> > - it doesn't account for file readahead (although it easily could)
> >
> > - it doesn't account for pagefault-initiated readahead (it could)
> >
> >
> > hm. There's actually quite a lot we could do here to make these fields
> > more accurate and useful. A lot of this depends on what the definition of
> > these fields _is_. Is is just for disk IO? Is it supposed to include
> > console IO, or what?

I'd be interested in your opinions on all the above, please.

2006-09-28 20:05:07

by Roland

[permalink] [raw]
Subject: Re: I/O statistics per process

andrew, jay - thank you very much for giving your comments on this issue and
discussing this!

> I'd be interested in your opinions on all the above, please.

just kidding here a little bit (pardon!) - but from a user/admin perspective
we "just" want something like this:

http://www.my-vserver.de/stuff/linux/process-explorer.png (especially mind
the I/O columns in the middle of the screenshot)

awesome tool - but unfortunately mr. russinovich is working for the wrong
company/developing for the wrong operating system.... ;)

regards
roland





----- Original Message -----
From: "Andrew Morton" <[email protected]>
To: "Jay Lan" <[email protected]>
Cc: "roland" <[email protected]>; "Fengguang Wu" <[email protected]>;
<[email protected]>; <[email protected]>
Sent: Thursday, September 28, 2006 9:09 PM
Subject: Re: I/O statistics per process


> On Thu, 28 Sep 2006 11:55:38 -0700
> Jay Lan <[email protected]> wrote:
>
>> Andrew Morton wrote:
>> > On Wed, 27 Sep 2006 23:22:02 +0200
>> > "roland" <[email protected]> wrote:
>> >
>> >>thanks. tried to contact redflag, but they don`t answer. maybe support
>> >>is
>> >>being on holiday.... !?
>> >>
>> >>linux kernel hackers - there is really no standard way to watch i/o
>> >>metrics
>> >>(bytes read/written) at process level?
>> >
>> > The patch csa-accounting-taskstats-update.patch in current -mm kernels
>> > (whcih is planned for 2.6.19) does have per-process chars-read and
>> > chars-written accounting ("Extended accounting fields"). That's
>> > probably
>> > not waht you really want, although it might tell you what you want to
>> > know.
>> >
>> >>it`s extremly hard for the admin to track down, what process is hogging
>> >>the
>> >>disk - especially if there is more than one task consuming cpu.
>>
>> Rolend,
>>
>> The per-process chars-read and chars-writeen accounting is made
>> available through taskstats interface (see Documentation/accounting/
>> taskstats.txt) in 2.6.18-mm1 kernel. Unfortunately, the user-space CSA
>> package is still a few months away. You may, for now, write your
>> own taskstats application or go a long way to port the in-kernel
>> implementation of pagg/job/csa.
>>
>> However, the "Externded acocunting fields" patch does not provide you
>> straight forward answer. The patch provides accounting data only at
>> process termination (just like the BSD accounting) and it seems that
>> you want to see which run-away application (ie, alive) eating up your
>> disk. The taskstats interface offers a query mode (command-response),
>> but currently only delayacct uses that mode. We would need to make
>> those data available in the query mode in order for application to
>> see accounting data of live processes.
>
> ow. That is a rather important enhancement to have.
>
>> >
>> > csa-accounting-taskstats-update.patch makes that information available
>> > to
>> > userspace.
>> >
>> > But it's approximate, because
>> >
>> > - it doesn't account for disk readahead
>> >
>> > - it doesn't account for pagefault-initiated reads (althought it easily
>> > could - Jay?)
>> >
>> > - it overaccounts for a process writing to an already-dirty page.
>> >
>> > (We could fix this too: nuke the existing stuff and do
>> >
>> > current->wchar += PAGE_CACHE_SIZE;
>> >
>> > in __set_page_dirty_[no]buffers().) (But that ends up being wrong if
>> > someone truncates the file before it got written)
>> >
>> > - it doesn't account for file readahead (although it easily could)
>> >
>> > - it doesn't account for pagefault-initiated readahead (it could)
>> >
>> >
>> > hm. There's actually quite a lot we could do here to make these fields
>> > more accurate and useful. A lot of this depends on what the definition
>> > of
>> > these fields _is_. Is is just for disk IO? Is it supposed to include
>> > console IO, or what?
>
> I'd be interested in your opinions on all the above, please.
>

2006-09-28 22:02:11

by Jay Lan

[permalink] [raw]
Subject: Re: I/O statistics per process

Andrew Morton wrote:
> On Thu, 28 Sep 2006 11:55:38 -0700
> Jay Lan <[email protected]> wrote:
>
>> Andrew Morton wrote:
>>> On Wed, 27 Sep 2006 23:22:02 +0200
>>> "roland" <[email protected]> wrote:
>>>
>>>> thanks. tried to contact redflag, but they don`t answer. maybe support is
>>>> being on holiday.... !?
>>>>
>>>> linux kernel hackers - there is really no standard way to watch i/o metrics
>>>> (bytes read/written) at process level?
>>> The patch csa-accounting-taskstats-update.patch in current -mm kernels
>>> (whcih is planned for 2.6.19) does have per-process chars-read and
>>> chars-written accounting ("Extended accounting fields"). That's probably
>>> not waht you really want, although it might tell you what you want to know.
>>>
>>>> it`s extremly hard for the admin to track down, what process is hogging the
>>>> disk - especially if there is more than one task consuming cpu.
>> Rolend,
>>
>> The per-process chars-read and chars-writeen accounting is made
>> available through taskstats interface (see Documentation/accounting/
>> taskstats.txt) in 2.6.18-mm1 kernel. Unfortunately, the user-space CSA
>> package is still a few months away. You may, for now, write your
>> own taskstats application or go a long way to port the in-kernel
>> implementation of pagg/job/csa.
>>
>> However, the "Externded acocunting fields" patch does not provide you
>> straight forward answer. The patch provides accounting data only at
>> process termination (just like the BSD accounting) and it seems that
>> you want to see which run-away application (ie, alive) eating up your
>> disk. The taskstats interface offers a query mode (command-response),
>> but currently only delayacct uses that mode. We would need to make
>> those data available in the query mode in order for application to
>> see accounting data of live processes.
>
> ow. That is a rather important enhancement to have.

Yes, it is needed to provide accounting on live processes. Both
BSD and CSA traditionally focused on completed processes. I guess
that was the difference between a system accounting and system
monitoring?

I certainly can make this enhancement. :)

>
>>> csa-accounting-taskstats-update.patch makes that information available to
>>> userspace.
>>>
>>> But it's approximate, because
>>>
>>> - it doesn't account for disk readahead
>>>
>>> - it doesn't account for pagefault-initiated reads (althought it easily
>>> could - Jay?)
>>>
>>> - it overaccounts for a process writing to an already-dirty page.
>>>
>>> (We could fix this too: nuke the existing stuff and do
>>>
>>> current->wchar += PAGE_CACHE_SIZE;
>>>
>>> in __set_page_dirty_[no]buffers().) (But that ends up being wrong if
>>> someone truncates the file before it got written)
>>>
>>> - it doesn't account for file readahead (although it easily could)
>>>
>>> - it doesn't account for pagefault-initiated readahead (it could)
>>>

Mmm, i am not a true FS I/O person. The data collection patches i
submitted in Nov 2004 was the code i inherited and has been
used in production system by our CSA customers. We lost a bit in
contents and accuracy when CSA was ported from IRIX to Linux. I am
sure there is room for improvement without much overhead. Maybe FS
I/O guys can chip in?

>>>
>>> hm. There's actually quite a lot we could do here to make these fields
>>> more accurate and useful. A lot of this depends on what the definition of
>>> these fields _is_. Is is just for disk IO? Is it supposed to include
>>> console IO, or what?

Yes, the char_read and char_written are only for disk I/O.

>
> I'd be interested in your opinions on all the above, please.

Sorry i can not answer you on data colleciton code.

Thanks,
- jay

>


2006-09-28 22:14:16

by Andrew Morton

[permalink] [raw]
Subject: Re: I/O statistics per process

On Thu, 28 Sep 2006 15:00:17 -0700
Jay Lan <[email protected]> wrote:

> >>> in __set_page_dirty_[no]buffers().) (But that ends up being wrong if
> >>> someone truncates the file before it got written)
> >>>
> >>> - it doesn't account for file readahead (although it easily could)
> >>>
> >>> - it doesn't account for pagefault-initiated readahead (it could)
> >>>
>
> Mmm, i am not a true FS I/O person. The data collection patches i
> submitted in Nov 2004 was the code i inherited and has been
> used in production system by our CSA customers. We lost a bit in
> contents and accuracy when CSA was ported from IRIX to Linux. I am
> sure there is room for improvement without much overhead.

OK, well it sounds like we're free to define these in any way we like. So
we actually get to make them mean something useful - how nice.

I hereby declare: "approxmiately equal to the number of filesystem bytes
which this task has caused to occur, or which shall occur in the near
future".

> Maybe FS
> I/O guys can chip in?

I used to be one of them. I can take a look at doing this. Given the lack
of any applciation to read the darn numbers out I guess I'll need to expose
them in /proc for now. Yes, that monitoring patch (and an application to
read from it!) would be appreciated, thanks.

2006-12-08 00:09:54

by Roland

[permalink] [raw]
Subject: Re: I/O statistics per process

hi!

didn`t discover that there is anything new about this (andrew? jay?) or if
some other person sent a patch , but i`d like to report that i came across a
really nice tool which would immediately benefit from per-process i/o
statistics feature.

please - this mail is not meant to clamor for such feature - it`s just to
show up some more benefits if this feature would exist.

vmktree at http://vmktree.org/ is some really nice monitoring tool being
able to graph performance statistics for a host running vmware virtual
machines (closed source - evil - i know ;) - and it can break that
statistics down to each virtual machine.

what`s hurting mostly here is that you have no clue how much I/O each of
those virtual machine is generating - you may give sort of a "guess" that a
machine with 100% idle cpu will not generate any significant amount of I/O,
but vmktree would be so much more powerful with a per-process I/O statistic,
since you can use your systems more efficient, because you would know about
you I/O hogs, too.

having the ability to take such information from /proc would be a real
killer feature - good for troubleshooting and also good for getting
important statistics!

roland

ps:
this is another person, desperately seeking for a tool providing that
information: http://www.tek-tips.com/viewthread.cfm?qid=1284288&page=4




----- Original Message -----
From: "Andrew Morton" <[email protected]>
To: "Jay Lan" <[email protected]>
Cc: "roland" <[email protected]>; "Fengguang Wu" <[email protected]>;
<[email protected]>; <[email protected]>
Sent: Thursday, September 28, 2006 11:14 PM
Subject: Re: I/O statistics per process


> On Thu, 28 Sep 2006 15:00:17 -0700
> Jay Lan <[email protected]> wrote:
>
>> >>> in __set_page_dirty_[no]buffers().) (But that ends up being wrong
>> >>> if
>> >>> someone truncates the file before it got written)
>> >>>
>> >>> - it doesn't account for file readahead (although it easily could)
>> >>>
>> >>> - it doesn't account for pagefault-initiated readahead (it could)
>> >>>
>>
>> Mmm, i am not a true FS I/O person. The data collection patches i
>> submitted in Nov 2004 was the code i inherited and has been
>> used in production system by our CSA customers. We lost a bit in
>> contents and accuracy when CSA was ported from IRIX to Linux. I am
>> sure there is room for improvement without much overhead.
>
> OK, well it sounds like we're free to define these in any way we like. So
> we actually get to make them mean something useful - how nice.
>
> I hereby declare: "approxmiately equal to the number of filesystem bytes
> which this task has caused to occur, or which shall occur in the near
> future".
>
>> Maybe FS
>> I/O guys can chip in?
>
> I used to be one of them. I can take a look at doing this. Given the
> lack
> of any applciation to read the darn numbers out I guess I'll need to
> expose
> them in /proc for now. Yes, that monitoring patch (and an application to
> read from it!) would be appreciated, thanks.
>

2006-12-08 08:53:25

by Roland

[permalink] [raw]
Subject: Re: I/O statistics per process

this is really great news!

thank you!


----- Original Message -----
From: "Fengguang Wu" <[email protected]>
To: "roland" <[email protected]>
Cc: "Andrew Morton" <[email protected]>; "Jay Lan" <[email protected]>;
<[email protected]>; <[email protected]>
Sent: Friday, December 08, 2006 2:22 AM
Subject: Re: I/O statistics per process


> Hi,
>
> On Fri, Dec 08, 2006 at 01:09:01AM +0100, roland wrote:
>>
>> didn`t discover that there is anything new about this (andrew? jay?) or
>> if
>> some other person sent a patch , but i`d like to report that i came
>> across
>> a really nice tool which would immediately benefit from per-process i/o
>> statistics feature.
>
> Andrew has added kernel support to it in -mm tree.
> Check this commit log:
> http://www.mail-archive.com/[email protected]/msg02975.html
>
> io-accounting-core-statistics.patch
> io-accounting-write-accounting.patch
> io-accounting-write-cancel-accounting.patch
> io-accounting-read-accounting-2.patch
> io-accounting-read-accounting-nfs-fix.patch
> io-accounting-read-accounting-cifs-fix.patch
> io-accounting-direct-io.patch
> io-accounting-report-in-procfs.patch
> io-accounting-via-taskstats.patch
> io-accounting-add-to-getdelays.patch
>
> Regards,
> Fengguang Wu

2006-12-08 01:22:12

by Wu Fengguang

[permalink] [raw]
Subject: Re: I/O statistics per process

Hi,

On Fri, Dec 08, 2006 at 01:09:01AM +0100, roland wrote:
>
> didn`t discover that there is anything new about this (andrew? jay?) or if
> some other person sent a patch , but i`d like to report that i came across
> a really nice tool which would immediately benefit from per-process i/o
> statistics feature.

Andrew has added kernel support to it in -mm tree.
Check this commit log:
http://www.mail-archive.com/[email protected]/msg02975.html

io-accounting-core-statistics.patch
io-accounting-write-accounting.patch
io-accounting-write-cancel-accounting.patch
io-accounting-read-accounting-2.patch
io-accounting-read-accounting-nfs-fix.patch
io-accounting-read-accounting-cifs-fix.patch
io-accounting-direct-io.patch
io-accounting-report-in-procfs.patch
io-accounting-via-taskstats.patch
io-accounting-add-to-getdelays.patch

Regards,
Fengguang Wu