As part of the new resource counters I'm adding for our job
accounting feature, I'm trying to gather blocks read and written
on a per task basis (which will get rolled into a per job statistic
outside of the kernel). I added task struct counters which get
incremented in drivers/block/ll_rw_blk.c in the drive_stat_acct() procedure,
which is where the kernel stats are gathered for blocks read/written.
What I noticed, though, was that some system daemons would be
the current task when going through drive_stat_acct() for writes, but
usually it was kupdate. In doing some searching, I now see that kupdate
is the kernel thread responsible to flush the dirty buffers out to disk.
I need to capture this blocks read/written information for user
processes, not kupdate.
We were able to get this block info on the Cray systems (which
have separate direct IO and buffered IO) and somewhat on the IRIX
systems (where whichever process is running when the buffers get flushed
is the process which gets charged for those buffers). Apparently, this
information is useful for capacity planning and in cache thrashing
situations. It's not used for billing purposes.
Is there a reasonable way on Linux to get any blocks read and
written information on a per task basis? Thanks for any help.
----
Marlys Kohnke Silicon Graphics Inc.
[email protected] 655F Lone Oak Drive
(651)683-5324 Eagan, MN 55121
Marlys Kohnke wrote:
>
> As part of the new resource counters I'm adding for our job
> accounting feature, I'm trying to gather blocks read and written
> on a per task basis (which will get rolled into a per job statistic
> outside of the kernel). I added task struct counters which get
> incremented in drivers/block/ll_rw_blk.c in the drive_stat_acct() procedure,
> which is where the kernel stats are gathered for blocks read/written.
>
> What I noticed, though, was that some system daemons would be
> the current task when going through drive_stat_acct() for writes, but
> usually it was kupdate. In doing some searching, I now see that kupdate
> is the kernel thread responsible to flush the dirty buffers out to disk.
> I need to capture this blocks read/written information for user
> processes, not kupdate.
>
> We were able to get this block info on the Cray systems (which
> have separate direct IO and buffered IO) and somewhat on the IRIX
> systems (where whichever process is running when the buffers get flushed
> is the process which gets charged for those buffers). Apparently, this
> information is useful for capacity planning and in cache thrashing
> situations. It's not used for billing purposes.
>
> Is there a reasonable way on Linux to get any blocks read and
> written information on a per task basis? Thanks for any help.
In general it is impossible to assign all block read/writes to
particular tasks because many tasks may use the same block. Maybe you
want to track block requests instead? These happen in the task context,
but the paths are scattered. Metadata is generally read using
ll_rw_block but written using mark_buffer_dirty (and the actual write
will be initiated by kflush or kupdate later). Sometimes ll_rw_block is
used for writing, usually for synchronous operations. The paths for
page cache data - most of your data - are even more scattered. To track
then down, look for _find_page and variants.
Reads are associated pretty closely in time with the requesting process
but writes are not, so charging the transfers to the most recent
userspace task might be ok for reads - it doesn't work very well at all
for writes.
It doesn't sound very promising, does it? You might consider tracking
block transfers by inode (easy for data, harder for metadata) and moving
the counts into per-task totals on file close.
--
Daniel